<<

© 2016 Nature America, Inc. All rights reserved. A A full list of affiliations appears at the end of the paper. leading to biodiversity. The genome evolved comparatively mechanisms evolutionary illuminates and duplication genome after osts tele­ of group sister unduplicated the represents lineage its because ‘gar’; to henceforth biomedicine teleost connect human biology, the we genome of sequenced help ( To identification. ortholog frustrates that evolution, ohnolog of rates asymmetric to due often many human humans including duplication genome from derived cates ref. see (VGD1 and VGD2 (ref. duplication genome vertebrate early human biology ( medaka) and (for zebrafish example, for human models disease important provide Teleost represent about half of all living vertebrate of with expression identified apparent Numerous the content lineage To Di Federica Palma T Chris Amemiya Bronwen Aken Peter F Stadler W Litman Gary Mario Fasold Cristian Cañestro Alison P Lee Michael S Campbell Jeremy Pasquier Ingo Braasch and facilitates human The spotted gar genome illuminates vertebrate evolution OPEN Nature Ge Nature Received 12 September 2015; accepted 12 February 2016; published online 7 March 2016; corrected after print 25 April 2016;

Gar informs the evolution of vertebrate genomes and functions

vertebrate connect

19 evolution

subfunctionalization. , 2 0 1

(

diverged 1 and Fig. 1 Fig.

) followed by reciprocal loss of some ohnologs (gene dupli (gene of loss ) by ohnologs some followed reciprocal using

in n

conserved levels human etics

size human 2–

genomes of a 14 9

Fig. 1 1 ). 24 . Connecting teleost genes and gene functions to to functions gene and genes teleost Connecting . gar:

Supplementary Fig. 1 Fig. Supplementary , immunity,

, , Tetsuya Nakamura 34

24 from many 13

10 for , 15 34 26

6 , , , Andrew R Gehrke , , Han Wang VOLUME 48 | NUMBER 4 | APRIL 2016 APRIL | 4 NUMBER | 48 VOLUME biology 1

, ,

1 , , Angel Amores 11 functional , Mikio , IshiyamaMikio 4 a genome-wide

, , Ronda T Litman 19 6 29 duplicated ; (ii) the TGD, which resulted in duplicates of duplicates in resulted which TGD, the (ii) ;

) can be challenging given (i) the two rounds of 8 ; ; and (iii) rapid teleost sequence evolution

noncoding , , Mark Yandell ,

, , , Byrappa Venkatesh 34 and 9 entire 20 , 34 , , Jessica Alföldi , , Jason Sydes

, , Daniel Barrell The mineralization to the

before studies gar function

teleost 22

elements

biomedical

association , genome 23 1

, , John S Taylor and and 1

teleost 2 uncovered ) in teleosts and , tetrapods, and teleosts in ) , , Thomas Desvignes 9

genes

25 2 , , Igor Schneider 1 of 26 , Domitille , Chalopin Domitille , , Felix E G Beaudry 2 , , Jeremy Johnson

, , Jeramiah J Smith

and Supplementary Note Supplementary 8 human (CNEs; , , Masato Mikami

provides

, , Kerstin Lindblad-Toh from 10

genome

often models,

studies. 14 , development 11 - , , Peter W H Holland

, , Kyle J Martin conserved bony

often teleost teleost comparisons regulatory

approximate

a

L. L. oculatus duplication

21 we Transcriptomic resource

vertebrate

cis , , Quenton Fontenot 1

0 sequenced 31

), ), but 1 regulatory) 17

and , , Jeffrey A Yoder roles (mediated,

, 8 sequences. 1 1 , , Steffi Kehr 27 8 , , Peter Batzel ­ )

3 21 , , for

, , Kazuhiko Kawasaki

, , Tatsuya Ota 15 the

, , Yi Sun

(TGD). for 12 ancestors.

LG sequenced to was 90× Louisiana coverage in usingcollected female Illumina gar technology.adult single a The of ALLPATHS genome The Genome RESULTS disease, health, evolution. and human physiology to development, biology teleost bridges genome duplication, gar genome the of legacy the illuminating By TGD. the after gar gene, as expected if ancestral regulatory elements were partitioned TGD Global comparisons. gene show expression analyses that sequence expression domains and levels for direct by detected not are that but share humans and teleosts that regulatory, often are which CNEs, of identification the facilitates Notably,gar years. million 450 for pods some with tetra many have conserved entire gar been chromosomes slowly and clarifies the evolution and oforthology problematic teleost , understanding 34 ,

34 patterns

the such

, , Shaohua Fan 2 analyses 8

1 , , John F Mulley undetectable

, draft assembly covers 945 Mb with quality metrics comparable metrics quality with Mb 945 covers assembly draft - 33 for 12

generated duplicates usually sum to those for the corresponding

genome The - & John H Postlethwait

, , Yann Guiguen 22 coding and microRNA (miRNA) gene families. Surprisingly, and microRNA coding (miRNA) gene families. cryptic

example,

24 assembly ,

23

Gar

slowly and 1 , , Marcia Lara

, , Jana Hertel 30 showed , , Julian Catchen 17 28

, Allyse , Ferrara Allyse

, bridges of

, , Nil Ratan Saha 18 levels CNEs, evolution

, , Jean-Nicolas Volff spotted

evolving

in by

16 and

that direct

, Hox, of

13 34 4

facilitating teleosts , , Tereza Manousaki

, , Vydianathan Ravi , , Dustin Wcisel annotation expression

the 6 gar after

, , Julien Bobe 24

8 gar ParaHox human-teleost , , John H Letaw

,

sums 34 (

Lepisosteus oculatus

genome 7 to genome , , Michael J Beam , , Aaron M Berlin 30

annotation tetrapods 29 1

, , Stephen M J Searle of

for

, Louise , WilliamsLouise doi:10.1038/ng.352 and

expression

gar

has duplication, 15

microRNA 6 17

, , Axel Meyer

, , Neil H Shubin comparisons

genes,

conserved , by 18

1 of 5 14 , ,

, s e l c i t r A illuminating

sequences ,

domains

consistent ), 1

8 genes).

,

the 6 , whose

in

become 8

origin 16 ,

10 and

,

32

,

2 427 , ,

­ -

© 2016 Nature America, Inc. All rights reserved. lineage. gar the in altered were that traits ancestral be may or gent and teleosts by shared that morphologies ( 6 Fig. teleosts to Supplementary group sister the as bowfin) and (gar Holostei and and our transcriptome of the bowfin jawed vertebrates Phylogenies of 243 one The NoteSupplementary 2 Figs. ing TE phylogenetic origins ( coelacanth of that to similar and a TE and profile TE superfamilies teleost elements (TEs) representing most lobe transposable including repetitive, is genome protein tein Note and 2,595 noncoding RNAs ( of the MAKER annotations), 42 pseudogenes identified protein18,328 annotation Ensembl and genes high 21,443 of struction of a gene set annotated by MAKER ( stages developmental and tissues adult from ( (LGs) groups linkage 29 in bases assembled scaffolds to a meiotic map some blies assem vertebrateIllumina for other those to S the also See (Teleostei). teleosts the lineage, sister their to compared as bowfin) and (gar Holostei of evolution slow and monophyly the ( node) each at number (second analysis likelihood maximum- from support bootstrap and node) each at number (first probability posterior as shown is support Node fishes. cartilaginous on CAT the + under + GTR PhyloBayes using vertebrates (gnathostome) jawed 25 from ratio orthology a one-to-one with 243 for positions acid amino 97,794 of alignment an from inferred phylogeny ( (MYA). ago years million 450 fishes lobe-finned and ray-finned of divergence the before occurred that VGD2) and (VGD1 duplication genome vertebrate of rounds earlier two the after evolution clarifying by human, including tetrapods, and coelacanth, as such vertebrates, lobe-finned to teleosts connects Gar TGD. the before stickleback, and medaka platyfish, zebrafish, models biomedical major the including fishes, teleost from diverged that ( genomes. 1 Figure s e l c i t r A 428 lution upeetr Note Supplementary upeetr Note Supplementary upplementary Data Data upplementary Darwin applied his term ‘living fossil’ to ‘ganoid fishes’, to fossil’ including ‘living term his applied Darwin

-

3 Supplementary Data Set Supplementary 2 gar coding genes) and zebrafish (25,642 (25,642 zebrafish and genes) coding ), in comparison to human (20,296 pro human to comparison (20,296 in ), 1 - 1 3 ; indeed, gars ; show evo low indeed, and rates phenotypic of speciation level level genome assembly . To generate a ‘chromonome’ (chromo S 2 – -

. . Evolutionary rate analyses using cartilaginous fish outgroups upplementary Fig. 6 Fig. upplementary

coding genes). About 20% of the gar gar the of 20% About genes). coding 5 Spotted gar bridges vertebrate vertebrate bridges gar Spotted lineage , , a ) Spotted gar is a ray-finned fish fish a ray-finned is gar ) Spotted upeetr Tbe 1 Tables Supplementary - coding genes coding (mostly a subset -

ofdne protein confidence 1 evolved 7 , including the gar genome S et ). Γ - . fclttd h con the facilitated ) to 4 model with rooting rooting with 4 model and and ). Transcriptomes Transcriptomes ). - 2

one orthologs in 25 0 ). The tree shows shows tree The ). slowly , capturing 94% of b 2 Supplementary Note Supplementary Supplementary Supplementary Supplementary Supplementary 2 ) Bayesian ) Bayesian 2 ), strongly supported ), the strongly monophyly supported of 4 ), ), we anchored , thus clarify thus , calva - – - coding coding finned 3 and and 2 3 ­ ­ ­ ­ ­

( Supplementary Note 29 b a , ) 3 25– 0 1.0, 100 may be conver 1.0, 100 2 8 1.0, 67 , suggesting suggesting , 1.0, 100 Fig. 1 Fig. Teleosts TGD 1.0, 100 b ­ ­ ,

1.0, 100 ( vertebrate bony evolving slowest the verte coelacanth, bony except other brates most than outgroup cartilaginous the to length branch shorter substantially a had Holostei sequences. teleost than showed that gar and bowfin proteins have evolved significantly slower coelacanth and even rate to as low in TEs compared turnover teleosts, evolution sequence rate of teleost high results support the hypothesis that the TGD could have the facilitated 1.0, 100 Fig. 1 Fig. Biomedical models 1.0, 100 Spotted catshark Elephant 1.0, 100 Spotted gar 1.0, 100 Little skate Coelacanth b 1.0, 100 1.0, 100 , , 1.0, 100 Supplementary Table 4 Table Supplementary 1.0, 100 2 1.0, 100 VOLUME 48 | NUMBER 4 | APRIL 2016 APRIL | 4 NUMBER | 48 VOLUME 4 ( 1.0, 100 1.0, 100 Supplementary Fig. 5 Fig. Supplementary Coelacanth Turtle 1.0, 100 1.0, 100 0.55, 1.0, 100 Turkey Chicken Bowfin – Zebra finch Platypus Tetrapods Ray-finned fishes Lizard Opossum Tammar wallaby Spotted gar Human Elephant Dog 1.0, 100 Armadillo

Western clawedfrog L

o

Mouse b

e

-

Chinese brownfrog f

i n

n

and and e

d

Zebrafish

and and f

i

s h

17 e

Supplementary Note Supplementary s , Fugu 450 MY 18 Substitutions persite Supplementary Note Supplementary , 3 4

. Gar TEs also showed a showed . also Gar TEs B

Nile tilapia

o

A n 0.2

y

v

Nature Ge Nature

e

r

t

e

b

r

a

t

e s VGD2 n ). Our Our ). VGD1 etics ). 17 ,

3 Holostei Teleostei 3 ­

© 2016 Nature America, Inc. All rights reserved. , indicating rearrangements before the TGD ( TGD the before rearrangements indicating chromosome, gar one than more with synteny conserved shows usually teleosts in gar verifying segments, ohnologous two have usually teleosts segment, after, and possibly as a result of, the TGD the number of interchromosomal rearrangements occurred in teleosts available. yet are chromonomes no which for coelacanth in microchromosomes with macro the of both karyotype of common bony ancestor gar vertebrate and possessed chicken the that evidence strong is content gene and size ( lengths assembly chromosome correlated highly with chromosomes gar in tionship gar karyotype (14/29 chromosomes) showed a nearly one the of half Almost translocations. or fusions fissions, large 17 about ( mosomes gar 8 Figs. Supplementary ( species all in segments of orthologous conservation tinct dis highlighted teleosts and chicken human, of those to mosomes Fig. Supplementary 7 type (2 of the TGD. effects The the gar confounding without karyo analyses long time first the for allowing vertebrate, jawed Gar represents the first chromonome Gar the See teleosts. in microchromosomes of pre-TGD absence such of the explain examples fusions Multiple chromosome rediploidization. and rearrangements intrachromosomal subsequent and TGD by chromosome fusion gar, followed the from of divergence duplication after teleosts to leading lineage the in fusion chromosome by explained is which medaka Ola12, with and Ola9 synteny chromosomes double-conserved possess chromosomes three All Loc21-Gga17. and Loc20-Gga15 microchromosomes and macrochromosome Loc2-GgaZ example, gar, for and chicken in conserved largely remain which of some microchromosomes, and macro- both contained ancestor vertebrate bony the of genome The teleosts. modern to leading evolution karyotype the ( genome. reference oriented arbitrarily the to respect in different a in displayed are chromosomes gar The Ola11). and Ola16 medaka to corresponds Loc11 gar example, (for genome teleost post-TGD a to gar of relationship synteny double-conserved ( correlated highly are synteny conserved one-to-one with chromosomes ( etc.). Gga11, and Loc23 Gga14, and Loc13 example, (for microchromosomes particularly chromosomes, entire many for conservation synteny one-to-one and years million 450 over genomes the of conservation strong ( chromosomes. right) (black, human and left) (colored, gar of synteny conserved ( annotations). chromosome for (see microchromosomes and macro- ( structure. 2 Figure Nature Ge Nature 8 Figs. Supplementary R d a Fig. 2 Fig. 2 ) The assembled chromosome lengths for gar and chicken chicken and gar for lengths chromosome assembled The ) 2 The gar chromonome also tests the hypothesis that an increase in in increase an that hypothesis the tests also chromonome gar The n = 0.97). ( 0.97). = - - =58 ees dvrec bfr te TGD the before divergence teleost chro of many entire conservation showed comparisons chicken

informs - d n and microchromosomes as Ohno microchromosomes and

= 58) contains both macro and and Spotted gar preserves ancestral genome genome ancestral preserves gar Spotted a Fig. 2 Fig. n ) The spotted gar karyotype consists of of consists karyotype gar spotted The ) etics e Supplementary Note Supplementary

) Gar-medaka comparison shows the overall one-to-two one-to-two overall the shows comparison Gar-medaka ) the - chicken comparisons, including macro including comparisons, chicken c

). The chicken and gar karyotypes differed only by only differed karyotypes gar and chicken The ). evolution

c d VOLUME 48 | NUMBER 4 | APRIL 2016 APRIL | 4 NUMBER | 48 VOLUME ) Gar-chicken comparison shows shows comparison Gar-chicken ) than they are in in are they than and and and Supplementary NoteSupplementary 9 9

b , , and of , , and ) Circos plot Circos )

bony S - Supplementary Note Supplementary Supplementary Note Supplementary ). This similarity in chromosome chromosome in similarity This ). upplementary Fig. 7 Fig. upplementary and microchromosomes ( b 2 b 2

vertebrate of a non and and

3 10 2 15

19 22 25 2 6

5* *

6 0 21 0 9 3

and cartilaginous fishes cartilaginous and 9

. . For each gar chromosome 2 Ec TGD Each . 9 5 11

c 20 hypothesized, consistent consistent hypothesized, showing showing 4

; asterisks indicate chromosomes inverted with with inverted chromosomes indicate asterisks ; 8* 17 7*

- 27 3* 26

, tetrapod, non

). ). Aligning gar chro

12

14

f - 16 karyotypes

) Gar-chicken-medaka comparisons illuminate illuminate comparisons Gar-chicken-medaka ) 24

range gene order order gene range 28

29

18 1* 13

23

9 - 16

- derived pair pair derived 20

). ). Gar shares - ). ). Strikingly, and micro and

e

to

3 2 Fig. 2b Fig.

- 21

one rela 15 11* Fig. 2 Fig.

Fig. 2 -

13 14

teleost teleost *

12 X*

3* 6* 28 22 *

26

4 22 Y

2* 15 13

7 20 16 4

21 18

3 23 5* 5 8

a

e

e 27 10 9

5 17 19

1

17

­ ­ ­ ­ ­ ­ *

, , , ,

8* * 12

18 25 11

c

9* 24 o a udpiae ray unduplicated an for Note for Hoxincluding repertoire, clustersgene ( gar gene and physiology development, vertebrate ( immunity in involved gene of families evolution the clarifying thus often tetrapods, and and teleosts in TGD lost the reciprocally were experience that ohnologs VGD2 and not VGD1 ancestral did retains lineage its because tive to human models biology ost biomedical orthologs ( TGD the Lineage Gar rates. rearrangement account forteleost not might high fully and ment rate, even after accounting for the TGD ( similar to that in the chicken lineage. Zebrafish had a higher rearrange is that gar to relative pufferfish) and medaka (stickleback, comorphs account further found an into average fission TGD and translocation the rate in per taking Comparisons TGD. the experiencing without ( somes 10 Fig. divergence from gar but before the TGD ( lobe ray the in occurred have to eosts ( ( chicken with chromosomes whole many 7* 10

Analyses of developmental gene families showed stability in the the in stability showed families gene developmental of Analyses 10

15

22

19 25

5* *

1 6 14 21 Supplementary Supplementary Note 9

- 19 clarifies 11 2 ). ). Gar has 43 Hox genes organized into four clusters, as expected

finned lineage finned 29

20 17

Fig. 2 24 S

8* 4

). This finding explains how spotted gar has more chromo more has gar spotted how explains finding This ).

upplementary Note upplementary 4 22

- n 17 7*

specific loss of ohnologs often followed VGD1, VGD2 and and VGD2 VGD1, followed often ohnologs of loss specific = 29; 29; = 20 16

22 3* 27 26

7 11

i. 1 Fig. 12 Supplementary Note Supplementary ,

e 14

3

5 21 ). These results indicate that chromosome fusions thought 24 16 9 28

and frustrates the translation of knowledge from tele­ from knowledge of translation the frustrates and 23 2 29

vertebrate 18 1*

6* 15 13 3

Fig. 2 Fig. 19 23

11 22 9 8

14 a 12 1*

18 13 14 10 20

), which complicates the identification of true true of identification the complicates which ),

25 3*

3 16 9

7

Fusions 26

actually occurred in the teleost lineage after after lineage teleost the in occurred actually 10 7 * f a 24

) than typical teleosts ( teleosts typical than ) 5

- 1 for details. for Teleost ancestor ind ih ( fish finned ). ). These comparisons indicate that the TGD 4 Bony vertebrateancestor macro +micro gene Chicken (Gga) macro +micro -

15 Z* Gar (Loc) finned lineage after divergence from the the from divergence after lineage finned

17 2

z 2 13

macro 23 21 8

12 18 6 19 28 27 macro +micro 20 15

d 21 17 ). Chromosome size TGD gar (Mb) Gga15 20 40 60 80 upeetr Fg 12 Fig. Supplementary 0

Loc20 evolution Loc2* 2 0 1 Chromosome sizechicken(Mb) Fig. Fig. 2 3 . Gar is uniquely . informa Gar is uniquely Fig. 2 Fig. Supplementary Supplementary Fig. 11 4 0 Rediploidization Rearrangements f n and Ola9 ~24 or 25; ref. ref. 25; or ~24 c s e l c i t r A ) but few with tel with few but ) 6 0 Supplementary Supplementary Supplementary Supplementary Ola12 8 0 Medaka (ola)

macro R 1 9 Loc21 2 =0.97 0 GgaZ* Gga17 2 . No ). 429 3 8 ­ ­ ­ ­ ­ )

© 2016 Nature America, Inc. All rights reserved. ( out hypothesis the rules ( one just has medaka and Aldh1a2 vertebrates among number ( enzymes Aldh acid–synthesizing expression gene cluster Hox regulates acid Retinoic independently teleosts and tetrapods ( and coelacanth fishes in cartilaginous the VGD1/VGD2Gar possesses ohnolog VGD2 ohnolog ‘gone missing’ from teleosts ( containing seven genes. Gar retained more complete than those in teleosts and with tetrapods, four clusters ( clusters ( losses) (two coelacanth and ( one Hox commoncluster last bonygene from the vertebrate ancestor genes of massive Hox TGD.strating the after loss gene Teleosts orthologs lack example, zebrafish has 49 genes and stickleback has 46 genes), demon genome after duplication (for Hox expected 82 the cluster than genes ( gar in in present but teleosts ray common last the from divergence since gar in lost completely been has gene Hox bold). in (highlighted genes ( bold). in (highlighted mammalian the to ohnologous TGD 2 are chromosome zebrafish on region a miRNA-free 8 and chromosome zebrafish and LG4 and ( TGD the from derived 5 are and 10 chromosomes zebrafish on regions gene Scpp the that suggests LG2 on cluster gene Scpp gar the for synteny conserved of Analysis Mb). in (distance locations chromosomal distant into genes clustered originally separating rearrangements gar gar connect mark) a question by (indicated similarity ( vertebrates lobe-finned or ( teleosts either from only known previously genes of orthologs through lineages connects Gar genes. Spp1 and Odam to limited been previously had zebrafish) example, (for teleosts and coelacanth) and human example, (for vertebrates lobe-finned among ( (yellow) genes Sparc-like and genes Scpp (blue) acidic and (red) P/Q-rich including 3 Figure s e l c i t r A 430 gene repertoire important despite rhythms, control of light circadian a Supplementary Fig. 16 Supplementary hoxa14 Zebrafish Coelacanth Spotted ga Human Physiological Physiological mechanisms are shared among vertebrates, including hoxa6 lpq1

S upplementary Note upplementary 1 5 and percomorphs lack the HoxCb the cluster lack and percomorphs Supplementary Fig. 13 Fig. Supplementary and coelacanth coelacanth and ), fewer than tetrapods (for ), fewer than tetrapods example, human has losses) three and and Gar helps connect vertebrate protein-coding and miRNA genes. ( genes. miRNA and protein-coding vertebrate connect helps Gar r Supplementary Table 5 Supplementary LG X/ 126997 JH Aldh1a3 1 hoxd2 4 Y 4 Amel AMEL - 128762 JH finned ancestor. The The ancestor. finned

, zebrafish lacks all HoxDb , all lacks zebrafish cluster protein Sparcl1-like c 17 Mb ), zebrafish has two genes (

) Gar newly connects through synteny zebrafish TGD-derived ohnologs ohnologs TGD-derived zebrafish synteny through connects newly ) Gar Sparcl1-like ). ( ). Scpppq4 lpq8 4

). lpq7 Scpppq3 5 enam ? ? b that ENAM

fa93e10 enam 129486 JH ) The gar ‘conserved synteny bridge’ ( bridge’ synteny ‘conserved gar ) The aldh1a2 AMBN 4 scpp5 scpp5 4 AMTN

: tetrapods have three genes ( genes three have tetrapods : scpp7 Supplementary Fig. 12 Fig. Supplementary scpp7

, , MUC7 Aldh1a1 lpq6 Enam ; and gar gar ; and ambn ). In contrast, teleosts have far fewer fewer far have teleosts contrast, In ). ambn PROL1 ? Ambn

and and scpp6 4 lpq5 PROL3 0 Amtn )

, is recognizable as a pseudogene pseudogene a as recognizable is , PROL5

4 lpq4 , , upeetr Note Supplementary 5 Supplementary Figs. 14

cdx2 CSN3

4 scpp3dl . Finding all three genes in gar gar in genes three all Finding . dmp1 128739 JH Supplementary Note Supplementary 2 scpp3b FDCSP

was a lobe a was scpp3cl , indicating that , indicating

lpq5 ODAM

hoxd14 scpp3a scpp3bl pdx2 , which highlights , a which highlights VGD1/ scpp3al Scpppq2 PRR27

, , odam odam HTN1 Supplementary Supplementary Fig. 14 Odam with with dsppl1 lpq3 HTN3

, , previously found only Scpppq5 scpp9 scpp9 STATH aldh1a2 gene, missing from from missing gene, lpq2 CSN2 ? 4 enam Amtn - 1 lpq1 Scpppq4 CSN1S1 finned innovation finned , , , but gar lacks just but, lacks gar SCPPPQ1 ibsp ). Gar ParaHox Gar ). 4 sparcl1 sparcl1 Sparcl1 SPARCL1 genes in lobe-finned vertebrates. Arrows in human and zebrafish indicate intrachromosomal intrachromosomal indicate zebrafish and human in Arrows vertebrates. lobe-finned in genes 3 , , , but retinoic retinoic but , and pdx2 ambn and and Scppa3 DSPP dmp1 Dmp1 DMP1 ) vary in in vary ) scpp1 scpp1 ) are also also are ) Aldh1a1 aldh1a3 was lost mepe and and and

- dsppl1 S

10 Mb Dsppl1 coding coding

upplementary Note upplementary ibsp lbsp IBSP mepe1 15 MEPE lpq14 ). Further putative orthologies supported by only short stretches of sequence sequence of stretches short only by supported orthologies putative Further ). mepe2 ). ). ). ­ ) , 10 a LG

) Scpp gene arrangements in human, coelacanth, gar and zebrafish zebrafish and gar coelacanth, human, in arrangements gene ) Scpp 2 genes with zebrafish zebrafish with genes

on Hsa1, Hsa9 and Hsa19; these four ohnologs likely arose in in arose likely ( (ref. ohnologs VGD2 four and these VGD1 Hsa19; and Hsa9 Hsa1, segments on syntenic has Hsa6 on region III class MHC human The ( genes I class MHC teleost of orthologs to adjacent are some and the Hsa6, on in region III encoded class or II are class MHC orthologs human whose genes to linked folds 20 Fig. and 19 some distinct from Fig. teleost DA/DB and DE lineagesSupplementary ( (Z/P specific to teleost be ( folds scaf on unassembled remain genes MHC gar most although ration, as in tetrapods osts tele in but are unlinked fishes cartilaginous and tetrapods in linked ( genes II class and I class ( teleosts in pinopsin of lack the for pensate exo with along from teleosts, absent but tetrapods in present pinopsin, has Gar event. TGD the in which circadian genes originated in VGD events and clarifies which gar originated example, for tetrapods: and teleosts of repertoires gene Table 7 Supplementary and ( tetrapods clock circadian and teleosts between differences spp1 spp1 Spp1 SPP1 Supplementary Note Supplementary Supplementary Table 8 Table Supplementary lpq17 Evolution of vertebrate immunity becomes clearer using gar gar using clearer becomes immunity vertebrate of Evolution scpp8 lpq16b 51 scpp11b upeetr Note Supplementary lpq16a ?

, scpp11a S 5 Supplementary Fig. 21 Fig. Supplementary lpq15 2 ). Several gar MHC region genes are on unassembled scaf unassembled on are genes region MHC gar Several ). upplementary Note upplementary . In gar, at least one pair of class I and class II genes is linked linked is genes II I class and of class . pair one In gar, at least ) infers that the miRNA cluster of of cluster miRNA the that ) infers 5 lpq14 lpq13

VOLUME 48 | NUMBER 4 | APRIL 2016 APRIL | 4 NUMBER | 48 VOLUME lpq12 lpq11 JH128070 53 mir135c-1 gsp37 lpq10

, scpp12

5 lpq9 Supplementary Fig. 17 Fig. Supplementary 4 sparcr1 , suggesting that gar retains the ancestral configu ancestral the that gar retains , suggesting fa93e10 b c ). Major histocompatibility complex (MHC) (MHC) complex histocompatibility Major ). and and TG and and ; ref. ; ref. - TG ). scpp1 D like, L like, ) and some class II genes similar to to similar genes II class some and ) 5 Supplementary Figs. 19 Figs. Supplementary D , , ) 7 4 scpp6 - 8 ), as supported by the gar genome genome gar the by supported as ), Supplementary Note Supplementary mir135c-2 rhodopsin, previously thought to com thought previously rhodopsin, n osn ( opsin and 6 ). Gar has some class I genes thought thought genes I class some has Gar ). 8 , , Chr. - Chr. ). Orthologies (gray vertical bars) bars) vertical (gray Orthologies ). scpp3 Chr. 22 Chr. LG like and U/S like and and 9 5 3 Chr. 23 Chr. 11 LG Chr. Chr. 8 3 , , scpp8 1 1 with mammalian mammalian with mir731 scpp5 wdr6 Supplementary Table 8 Table Supplementary , , 5 nckipsd upeetr Fg 18 Fig. Supplementary 0 Supplementary Table 6 Table Supplementary genes, respectively; respectively; genes, . S

, , elk4

Mir425-191 celsr3 - 46 upplementary Fig. 26 Fig. upplementary and and scpp7 like, for example like, dalrd3 ,

4 mfsd 7 Nature Ge Nature

. Analyses of gar gar of Analyses . mir731 mir135c-1 mir135c-2 4 MIR425

mir462 cdk1 ) 4 and and

– mir462 Supplementary 9 8 MIR191 21 genes link the the link genes mir135 MIR135B ndufaf3 Mir135B ) are tightly tightly are ) scpp9 cluster cluster zgc:9228c impdh2 on gar gar on

arih2 klhdc8 n 7 zgc:112334 ) etics 54– pm20d1a

5 ). ). 6 ­ ­ ­ ­ ­ , ;

© 2016 Nature America, Inc. All rights reserved. gene orders in the gar and zebrafish clusters ( clusters zebrafish and gar the in orders gene by conserved supported a hypothesis orthologs, divergent have may scpp6 gar The clusion drawn by others using our data released before publication a con transcriptomes, and genome gar the in for Amel) ortholog no enamel enamel with vertebrates lobe in found are (Amel) amelogenin and (Enam) enamelin Supplementary Note ( teleosts or lobe vertebrates finned from only known Another 18 gar genes Scpp have ortholog no in either identified lobe genes Scpp six and includes orthologs of five Scpp genes previously found only in teleosts Note Supplementary Fig. 26 ( LG4 and LG2 on clusters two in gar in genes Scpp 35 tified common to lobe genes; zebrafish, 15 genes), and only14 2 genes (Spp1 and to Odam) seemed be coelacanth, genes; 23 (human, genes Scpp of number est Note enamel and enameloid areboth coveredby which contain ganoin, hypothesized to be a type of enamel because it retains ancient characteristics both in its ganoid scales, which sial between teleosts and tetrapods and their evolution remains controver phosphoproteins (Scpp) that form these tissues and enamel), yet the gene repertoires for the secretory calcium enameloid dentin, (bone, tissues mineralized share vertebrates Bony Gar tetrapods. to teleosts bridges genome immuno­ gar the In sum, duplication. gene after divergence or rapid duplications tandem recent few suggesting families, 15 form genes specific teleost be to thought were and nition receptor) genes ( from teleosts and/or tetrapods. Gar encodes Nitr (novel immune lies fami TLR major six all encompass TLRs gar identifiable 16 the that genes ( J and in unlike but, mammals ( genes receptor cell T Gar protection. surface exterior for suffice may scales ganoid gar immunity mucosal IgMsecond, distinct butlocus lacks IgT (IgZ) a has gar Unexpectedly, teleosts. of those resemble generally scripts S in as a bridge as gar using when 95 to opposed as elements 81 for established be only could orthology human-teleost Direct enhancers. limb human 153 of (red) loss and (green) gain evolutionary the shows alignments ( genomes. teleost and gar aligning gar,then to and genomes tetrapod aligning first when evident become but teleosts and human between align directly not do that elements for uncovered is orthology Hidden teleosts. to gar through ( 4 Figure Nature Ge Nature a upplementary Note upplementary ) The gar bridge principle of vertebrate CNE connectivity from human human from connectivity CNE vertebrate of principle bridge gar ) The The enamel matrix protein genes encoding ameloblastin (Ambn), (Ambn), ameloblastin encoding genes protein matrix enamel The ( genes immunoglobulin Gar 18 6

2 uncovers , a ), which contain contain which ), . amla gnms ee huh t cnan h larg the contain to thought were genomes Mammalian ). . Gar TLRs appear to share evolutionary histories . with the TLRs Gar histories appear TLRs to share evolutionary H 67 (ref. (ref. . See . See segments. Phylogenetic analyses of Toll of analyses Phylogenetic segments. Supplementary Fig. 24 Supplementary , - 6

ambn bearing teeth bearing 8 Gar provides connectivity of vertebrate regulatory elements. elements. regulatory vertebrate of connectivity provides Gar . Gar clarifies understanding of these genes and their evolution n S 13 upplementary Figure 37 Figure upplementary etics 3 and

) and and ) evolution Supplementary Fig. 25 - b finned vertebratesfinned and teleosts ) Connectivity analysis of 13-way whole-genome whole-genome 13-way of analysis ) Connectivity

enam for details. for 6 0 Supplementary Fig. 23 Fig. Supplementary VOLUME 48 | NUMBER 4 | APRIL 2016 APRIL | 4 NUMBER | 48 VOLUME , suggesting that IgT is teleost specific and that that and specific teleost is IgT that suggesting , ). fa93e10 66 , , , 6 spp1 - genes show sequence similarity to zebrafish to zebrafish similarity show sequence genes Supplementary Supplementary Table 9 Xenopus tropicalis Xenopus 8 bearing teeth but not in teleosts, which lack lack which but not in teleosts, teeth bearing . We identified We . identified

of and and

, respectively, suggesting that teleosts teleosts that suggesting respectively, , vertebrate ) in tetrapods, teleosts and gar teleosts ) showed in tetrapods, Fig. 3 Fig. Supplementary Fig. 22 Fig. Supplementary odam , , S a upplementary upplementary , , ), which function in allorecog­ Supplementary Table 9 Table Supplementary , respectively. Notably, gar gar Notably, respectively. ,

ambn mineralized 6 1 ) are tightly linked as in in as linked tightly are ) , are downstream of V of downstream are , 58 65 Fig. 3 Fig. , 63 and and 5 6 , - - and 9 6 8 finned vertebrates. vertebrates. finned like receptor (TLR) (TLR) receptor like , 6 , , thought to provide ( 6 7 differ substantially 4 0 T Fig. 3 6 . The 17 gar Nitr Nitr gar 17 The . ( able 22 able 9 enam , , and in its teeth, a Supplementary Supplementary Supplementary Supplementary ).

tissues a

) and tran and ) ). We). iden genes (but (but genes and the the and - binding - Fig. 3 Fig. finned finned - and type type 13 a 3 H ­ ­ ­ ­ ­ ­ - , .

Fig. 28 Fig. ( family known a lack 4 and families 107 to belong of 229 302 from mature 233 genes, which miRNAs derived identified approach ( Note Supplementary ( putationally loss in or one lineage gain in the other. We gar miRNAs studied com emerged in teleosts. Of the 22 families thought to have been lost lost been have to thought families 22 the Of genes teleosts. miRNA in emerged individual four and families four that showed sons specific tetrapod or teleost become could genes miRNA Gar lineages. both of characteristics genes Scpp ancient of subsets lost independently lobe and teleosts although that, and tissues, ized mineral in expressed were which of many genes, Scpp of repertoire rich a had vertebrates bony extant of ancestor common the that ing ( in bone 4 genes another scpp9 encode the principal organic component of the ‘true enamel’ that that lobe in enamel’ originated have to appears ‘true the of component organic principal the encode to subsequently evolved have may gene Amel The scales. ganoid in used originally genes matrix enamel by coopting evolved likely teeth enamel before arose and osteichthyans gnathostome fossils suggested that ganoin is plesiomorphic for crown gar enamel and ganoin supports homology of these tissues. Analysis of expression of Ambn and Enam in lobe ganoin in ganoid likely represent scales the same tissue, and common and Note Supplementary ( ganoin with scales includes that skin of sion Gar expresses 12 additional Scpp genes (including the the (including genes Scpp additional 12 expresses Gar RT

enam connects - yemnrlzto genes hypermineralization PCR PCR and our expres gar identified skin analysis transcriptome and and b a ambn ve is limited to is and limited enamel and Thus, in enamel ganoin. teeth rt Jaw ve Supplementary Supplementary Note (93) ebrat T rt Supplementary Table 11 Table Supplementary eleost ed ebrat and and Supplementary Fig. 27 Fig. Supplementary Bon

es vertebrate s es (1 y enam Alignment +24 17 ) and annotated them using a sequence a using them annotated and ) ), suggesting that strong expression of of expression strong that suggesting ), ) Lobef in enamel in Ra Supplementary Table 9 Supplementary (1 (1 +1 –7

28 No dir to yf microRNAomes 10 1 ins ins

) Gar br ga ) Hidden or –1 r ect sequencealignmen ). ). Small RNA +25 5 –2 Spot –5 idge pr - 6 finned vertebrates finned 6 - te ) in both teeth and scales and and scales and teeth both in ) containing gar teeth and in gar gar in and teeth gar containing thology Te - Co d , , T finned vertebrate enamel and in Spot Elephant shar ). Gar ). eleosts (95) trapods inciple Supplementary Table 10 Supplementary ga elacanth (1 Supplementary Table 9 Table Supplementary Human (1 7 r Alignment te 1 , d g 13 - 3 - ar (1 zebrafish ; thus, enamel thus, ; seq seq data for four tissues 53 23) 08 k (93) t to ) - 6 ), strongly suggest ), strongly s e l c i t r A )

finned vertebrates vertebrates finned 5 ga , gar has retained retained has gar , r Supplementary Supplementary human-t 81 element 6 alignment: 8 73 Human Dir , 13 ect 18 , eleost 7 3 4 , 7 . odam compari s 2 - by their their by bearing bearing - based based ambn and and and and and and 431 ­ ­ ­ ­ ­

© 2016 Nature America, Inc. All rights reserved. the zebrafish TGD Additionally, mammalian osts mammalian the loss. family accelerate not did TGD the that suggesting teleosts, to specific was loss family miRNA no Notably, ( fishes finned lobe and teleosts in present family single A teleosts many and in gar are present and three and teleosts, teleosts. gar both from in absent are families Fourteen overlooked previously were genes miRNA gar 4 of orthologs and family teleosts in S 50 bars, Scale hindlimb. HL, forelimb, FL, image; each of right bottom the at indicated is signal limb showing embryos LacZ-positive of number The (right). forelimbs in only but E10.5 at limbs mouse developing in expression reporter drives CNS65 Zebrafish (middle). digits developing in absent is and limb the of portion proximal the to restricted is activity CNS65 gar (E12.5), stages later At (left). E10.5 stage at (arrows) hindlimbs and forelimbs mouse early the throughout expression drives ( fins. pectoral the of portion distal the indicate lines Dashed (bottom). h.p.f. 48 around deactivated becomes and fin, the throughout expression GFP drives which h.p.f., 31 at beginning activity fin pectoral has CNS65 Gar (top). (h.p.f.) fertilization post- hours 36 at fins pectoral zebrafish in expression GFP reproducible and robust drive orthologs CNS65 (right) zebrafish and ( (right). pufferfish and zebrafish in orthologs CNS65 identified baseline the as CNS65 gar identified the Using (middle). sequence gar the in conservation of a peak shows gar, however, including alignment An left). (bottom pufferfish) and (zebrafish teleosts with not but CNS65 for chicken and human with conservation sequence show desert gene HoxD the of alignments VISTA baseline, the as mouse Using limbs. in expression HoxD early-phase drive that enhancers CNS65 and CNS39 the contains which desert, gene telomeric HoxD mouse ( CNS65. enhancer HoxD early-phase teleost and gar the of 5 Figure s e l c i t r A 432 the slowly evolving gar. evolving slowly the however, in fied tetrapods, in might be ray detected evolution ( sequence teleost rapid of because presumably teleosts, in absent be as function often CNEs Gar to their incorporation into multiple gene regulatory networks esis that miRNA genes are likely to be retained after a duplication owingfor protein retentionconsiderably analyzablethe than (81/208 cases), higher rate post The upplementary Note upplementary The ‘garThe bridge’ miRNAFor identify example,helpsorthologies. to 1

8 highlights , are orthologs of teleost

µ Identification and functional analysis analysis functional and Identification - m m ( TGD retention rate for zebrafish miRNA ohnologs is 39% 39% is ohnologs miRNA zebrafish for rate retention TGD - 1 Fig. Fig. 1 coding genes (20–24%; ref. 8 , 2 actually belong to the same same the to belong actually 2 , b ) and 500 500 ) and 7 miR150 4

but absent from zebrafish. zebrafish. from absent but hidden Mir425 b - a and derived ohnologs ) Top, schematic of the the of ) Top, schematic . ) was not found in gar.in found not ) was Supplementary Note Supplementary µ and cis

orthology m m ( Mir135B - c acting regulators acting ) Gar CNS65 CNS65 ) Gar c mir731 Mir191 ). See also the the also See ). b ) Gar (left) (left) ) Gar is orthologous to

and mir135c-1 genes, thought to be lost in tele in lost be to thought genes, of

7

cis 5 mir462 ), consistent with the hypoth

-regulatory

- ); ancestral ); CNEs ancestral identi 80

and , 8 , respectively ( 1 , , but many to appear a c b mir135c-2 C Zebraf Pu Chic Human Baseline: mir135c - entr finned fish using finned ff HL erf ke

omer elements ish ish n mous Ho e in gar and xD clust C e ( 76– Fig. 3 Fig. 3 hr 7 . 9 2: er . 75,438,81 31 h.p b c Gar CNS65 ). ). FL ­ ­ ­

.f . 3–75,441,00 aligned to gar, confirming their presence in the bony vertebrate vertebrate bony the in presence their confirming gar, to aligned human by detected not enhancers limb human direct through tified in gar,(108) 71% were but iden (81) 53% only identified alignments Note 36 Fig. and shark; elephant tetrapods five coelacanth, eosts, whole Tables 12 ray of ( ancestor fishes common the in recruited CNEs identified human by direct that were not shark) predicted from elephant (absent CNEs many bony vertebrate gar identified incorporating Analyses teleosts. bony between vertebrates and CNEs (conserved elephant shark) than that Pax6 more showed gar and contains gnathostome IrxB) clusters, E1 Gar CNS65 Gar elucidates the origins of tetrapod limb enhancers, evidenced by ParaHox and (Hox loci gene developmental near analyses CNE 0.5 3/3 Mtx2 ). Of 153 known human limb enhancers limb human known 153 Of ). - 1 genome alignments for 13 vertebrates (including gar, five tel gar, five (including vertebrates 13 for alignments genome - , , upeetr Fg. 14 Figs. Supplementary ees cmaios frhroe gar furthermore, comparisons; teleost Zebraf Pu Chic Human Gar Baseline: Supplementary Tables 20 20 Tables Supplementary – ff 19 erf VOLUME 48 | NUMBER 4 | APRIL 2016 APRIL | 4 NUMBER | 48 VOLUME ke ish ish n and and mouseChr 38 h.p Gar CNS65 Supplementary Note Supplementary .f . 2: . 75,438,81 HL CNS39 FL 3–75,441 E1 - teleost alignments. Of the 72 human human 72 the Of alignments. teleost 5/5 2.5 -0 , 01 15 15 48 h.p and Zebraf Chic Pu Mouse Human Zebraf Baseline: and - ). ff teleost alignment, 40% (29) (29) 40% alignment, teleost erf CNS65 ke 21 .f 29 ish ish . ish CNS65 n

ga 33 , and and , r Zebraf – , 82–

35 - LG Nature Ge Nature based alignments alignments based 8 , , ish CNS65 12 4 Supplementary Supplementary Supplementary Supplementary , human , Supplementary Supplementary : 15 ,379,653–1 Hnr FL npa3 T 55 h.p - elomer - 5,382,037 n E1 centric centric finned finned etics 0.5 3/4 .f . e ­ ­

© 2016 Nature America, Inc. All rights reserved. that gar might be a better model than teleosts for investigating investigating for teleosts than model better a be might gar that ( (2) gar (15) than enhancers limb more substantially lost teleosts that found we 22 Table Supplementary ( teleosts in fate their and ers enhanc limb of vertebrate origin evolutionary of the understanding tel one least ( at eost to aligning (14/29) human 48% of gar then Inspection and loss. defini than tive rather divergence not rapid repre sent enhancers might teleosts in human identified directly 29 the whether and ( data gar without lobe as ized character incorrectly been have would and tel in not but gar in occurred Fourteen ancestor. gnathostome the in existence their highlighting shark, elephant to aligned also diver 15 enhancers, 29 these Of considerable teleosts. in gence or loss and ancestor embryo. Emb, Te, Ov, ovary; testis; intestine; Int, bone; Bo, kidney; Liv, Kid, liver, muscle; Mus, heart; Hrt, in bars Error Bonferroni correction, gene (multiple two-sided Student’s the expression level of the preduplication expression level of ohnolog pairs approaches singleton/gar ratios, suggesting that the aggregate pair/gar ratios were not statistically different from significantly lower levels than singletons; ohnolog genes, individual ohnologs were expressed at gar to comparison In ortholog(s). teleost and genes gar for levels expression of ratios ( significance). correction, Bonferroni with tests Mann-Whitney Wilcoxon (multiple pair ohnolog the by subfunctions ancestral of sharing indicating singletons, for that than and alone ohnologs for that than greater was genes gar and pairs ohnolog for levels expression average between correlation The ortholog(s). teleost and genes gar of patterns ( (Md; medaka and (Zf; zebrafish in singletons and ohnologs ( criteria. subfunctionalization and neofunctionalization The pattern. gar the with ohnolog each of profile expression the of c gar. in as In heart in expressed other the and gpr22 ( liver. for ohnologs and ( ohnologs TGD-derived ( distribution ( TGD. the after evolution 6 Figure Nature Ge Nature fin the e and and , f Using the gar bridge ( bridge gar the Using ) Mean correlation between the expression expression the between correlation ) Mean S Supplementary Note Supplementary upplementary Note upplementary with one expressed in brain as in gar gar in as brain in expressed one with d d upeetr Tbe 22 Table Supplementary - , the , the ) Subfunctionalized TGD orthologs of orthologs TGD ) Subfunctionalized to

e Gar illuminates gene expression expression gene illuminates Gar – - limb transition limb h n r ) Expression conservation for for conservation ) Expression slc1a3 e b values denote the correlation correlation the denote values - etics – ) of gar and teleost singletons and and singletons teleost and gar ) of finned vertebrate innovations innovations vertebrate finned g h - , , s.e.m. Br, brain; Gil, gill; gill; Gil, Br, brain; , s.e.m. centric alignments showed showed alignments centric i. 4 Fig. h f ) Mean log ) Mean , h showing new expression in expression new showing

Supplementary Table 22 Table Supplementary S ) ) ( upplementary Note upplementary α VOLUME 48 | NUMBER 4 | APRIL 2016 APRIL | 4 NUMBER | 48 VOLUME S S = 0.05 for significance). = significance). 0.05 for b upplementary Note upplementary upplementary upplementary ). ( ). a and and , b α Fig. 4 Fig. c 8 ). Strikingly, despite using the gar bridge, bridge, gar the using despite Strikingly, ). ). ) The origin ( origin ) The 10 = 0.05 for for = 0.05 ) Neofunctionalized ) Neofunctionalized 5 . -transformed -transformed upeetr Fg 37 Fig. Supplementary Fig. 4 Fig. a t test with ). Gar thus substantially improves improves substantially thus Gar ). ), we tested tested we ), T - b able 23 able centric centric lists lists e a , ,

, ) and ) and Supplementary Fig. 37 Fig. Supplementary g e

). ). ) osts osts

­ ­ ­ ­ ­

e a g d c

Expression Expression 360

r value TG 100 Genes TG 100 350 Gene origins ), suggesting suggesting ), * * 0.25 0.35 0.45 Genes log (ratio) 0 D 10 0 D 290

0.12 0.16 0.04 0.08 gpr22 slc1a3 0 Br MYA Br Singleton 10,41 Singleton Gil Gil 10,41 Zf a and and Zf a 6 Zebrafish Hrt Hrt Zebrafish 6 Medaka gene Zebrafish gene Medaka gene Zebrafish gene Gar gene Medaka gene Zebrafish gene Medaka gene Zebrafish gene Gar gene ­

Mus Mus Ohno Zebrafish 1,606 Ohno Zf Human early drive elements and The CNS39 CNS65 elements. regulatory from but shared cryptic several features that are presumed to be homologous respectively distal mammalian limbs by ‘early’ and ‘late’ phases of gene expression, of a ‘gar bridge’.CNE and proximal pattern HoxD and HoxA clusters a 1,606 Liv Liv Zf b 1 Functional studies of a HoxD limb enhancer tested the usefulness usefulness the tested enhancer HoxD limb a of studies Functional

a a Kid Kid 1 Ohno a2 a1 a1

1,606 a2

Bo a2 a1 a2 a1 Bo - Ohno Zf a centric ( centric 1,606 Zf b 2 Int Int 8 2 Singleton Co-orthologs 6 Ohno pair Ov . . Early Ov Orthologs 1,606 r r r r Ohno pair =0.6494 =0.7523 =0.9377 =–0.1987 Zf ohnologs TGD b 1,606 Te Supplementary Table 22 Supplementary Te Gar Zf a Emb Em - - phase HoxD activation in mammals HoxD phase activation phase HoxD expression in and HoxD shows fins limbs phase expression b Ohnolog 1 b n n =1,606 =9,265 h f

log10 (ratio) r value 100 100 –0.06 –0.04 –0.02 0 0 0.02 0.04 0.06 0.35 0.25 0.45 n TGD ohnologs n =1,438genes =314pairs 0 Zebrafish Br singletons Br Ohnolog 2 Medaka n Singleton Singleton =518 Gil Gil 9,265 9,265 Md Md a a Hrt Hrt Mus Mus ) and local mouse local and ) Medaka Ohno Ohno 1,315 1,315 Md Liv Liv Md n = 774 b a n = 7,309 Medaka 1 1 Kid Medaka Kid n s e l c i t r A =2,840genes TGD ohnologs n n 8 =26 singletons Ohno Bo =274pairs Bo 7 Zebrafish Ohno 1,315 1,315 Medaka and may derive Md Md b a Int Int 7 2 2 8 8 Ov Ov ( Ohno pair r r r Ohno pair n r =0.6259 =0.9371 =–0.0647 =0.7287 Fig. 5 Fig. =10,416 n - 1,315 1,315 =1,315 centric centric Md Te Te Md a b Em Em 433 a b b ). ). © 2016 Nature America, Inc. All rights reserved. expression of orthologous teleost singletons, (ii) expression of each each of expression (ii) singletons, teleost orthologous of expression tissue plotted we sion, subfunctions regulatory ancestral of partitioning ( heart in other the and brain in ohnolog gpr22 functions new evolved other the while patterns ancestral maintained ohnolog ( testis or bone in expression little with domain, primarily expression novel liver,a in other the and brain in primarily ohnolog one expressed gar expressed ( species each in gene each for tissues across reads normalized then and medaka and gar,zebrafish RNA ( phylogenetic using medaka and TGD orthologous their and genes gar of list a a generated we using outgroup, TGD studied related been closely yet not has fates various of contribution the functions ancestral of partitioning and functions, tein loss cation: of one copy, of evolution domains or new expression pro non several experience Ohnologs Gar research. biomedical for models teleost enhancing potentially testing, functional for osts tele­ in model regions candidate relevant biomedically identify helps locations in zebrafish ( disease undetected otherwise ing condi in genome tions human to linked SNPs 1,000 around contain CNEs human Note Supplementary human these on depending overlap; CNEs (30–36%, of many for zebrafish to gar to human from ( bridge gar The teleosts. in lobe in originated elements these Note ( teleost any to not human 19,000 Around genomes. centric CNEs (39,964/143,525), more than in any of five aligned teleost biology. developmental mammalian to mechanisms regulatory teleost cryptic otherwise connects gar that and vertebrates bony of feature conserved ancestral, an is fins and limbs in expression phase of that HoxD regulation suggest early experiments functional These ( expression ( activated forelimb expression somewhat more weakly than (ref. gar CNS65 CNS65 mouse of activity the from able limbs of embryonic day (E) 10.5 mice ( fin ( pectoral zebrafish developing in the expression drove early zebrafish zebrafish and mice ( CNS65 used we ( bridge gar the by using only teleosts in identified was CNS65 Notably, gar. in CNS65 identified ( s e l c i t r A 434 Fig. 6a Fig. Fig. 5 Fig. Fig. 5 Fig. To of the effects the characterize TGD on evolution of gene expres To compare tissue Across the gar genome, we identified approximately 28% of human To test whether cryptic CNE orthologs preserve enhancer function,

Fig. Fig. 5

illuminates - ). Without gar, one would have erroneously concluded that that concluded erroneously have would one gar, Without ). mostly in brain and heart, but both teleosts expressed one one expressed teleosts both but heart, and brain in mostly seq seq analysis for ten adult organs and stage Fig. 6 Fig. c a ). At E12.5, gar CNS65 activated proximal but not distal limb but not proximal distal activated ). Atgar CNS65 E12.5, , ) alignments failed to detect CNS39 in ray in CNS39 detect to failed alignments ) b 9 , , b 5 Supplementary Table 23 Supplementary ). ). Gar drove CNS65 in expression the forelimbs and hind before the teleost radiation. In contrast, gar expressed expressed gar contrast, In radiation. teleost the before Fig. 5 Fig. c ). New expression domains like this are expected if one if are expected this like domains New expression ). slc1a3 - - wide association studies (GWAS), studies connect thereby association wide

driven reporter constructs to generate transgenic transgenic generate to constructs reporter driven gene c ), mimicking the endogenous mouse enhancer mouse endogenous the mimicking ), mainly in brain, bone and testis, but both teleosts Supplementary Note Supplementary Supplementary Table 21 Table Supplementary - specific gene expression specific patterns, we conducted ). These approximately 6,500 newly connected connected newly 6,500 approximately These ). Supplementary TableSupplementary 21

- expression - specific expression levels in gar versus (i) (i) versus gar in levels expression specific derived ohnologs or singletons in zebrafish zebrafish in singletons or ohnologs derived Fig. 4 Fig. Fig. 5 Fig. Supplementary Note Supplementary 9 3 - - and conserved synteny conserved and - exclusive fates exclusive after genome dupli associated haplotypes to genomic genomic to haplotypes associated centric CNEs aligned to gar but but gar to aligned CNEs centric -

a a evolution finned vertebrates or were lost lost were or vertebrates finned and and and and ) establishes hidden orthology orthology hidden establishes ) Fig. Fig. 5 Supplementary Table Supplementary 21 Supplementary Note Supplementary Supplementary Table 22 Supplementary ). ). CNS65 from either gar or Fig. 6 Fig. c ) ) that was indistinguish 8

- and and following 8 ). ). The gar bridge thus matched embryos for d ). Zebrafish CNS65 CNS65 Zebrafish ). ), as expected from from expected as ), 8 9 . - Supplementary Supplementary finned fish but but fish finned ). For example, example, For ). 89–

9 9 the 2 4 . Because Because . analyses analyses - centric centric

TGD ). and 8 ). 8 ­ ­ ­ ­ ­ ­ ­ - - .

however, showed no statistical difference from the singleton/gar singleton/gar the from difference statistical no showed however, ( orthologs gar to compared as singletons than levels lower significantly at expressed each were ohnologs individual that showed Comparisons ortholog. gar its to gene teleost each of ratio the computed and tissues 11 the model subfunctionalization the by predicted pair, as gene a as functions ancestral maintained which TGD between partitioned be to tended functions ( singletons with than gar with more and alone taken correlated closely gar ohnolog than either with TGD tissue average the contrast, In singletons. evolved expression pattern differences significantly more rapidly than have to appear not do random at binned ohnologs teleost gar, from single ancestral to compared when Thus, TGD teleost their of copy either of those and genes gar between patterns expression of correlation the from different significantly not was orthologs singleton teleost their of patterns the expression gar between genes and of correlation those the that showed results Our coefficients. correlation calculated then we and pair’), (‘ohnolog ohnologs retained both of level expression TGD org/index.htm uorego Lepisosteus_ocu URLs. ity of teleost models medical to human biology. ological features. In conclusion, the gar bridge facilitates the connectiv physi and developmental vertebratemany analyzing when choice of the laboratory The availability of gar embryos and the ease of raising eggs to adults in osts and humans but are elusive in direct teleost helps identify potential gene regulatory elements that are shared by tele and lobe are or oftensharedtetrapods toteleosts unique by sidered ray editing disease, including those generated with CRISPR/Cas9 for analysis anddesign, the interpretation of teleost ofmodels human of gene expression. Clarification of gene orthology and history is crucial including adaptive immunity and mineralized tissues, and the evolution tetrapods and elucidating the evolution of vertebrate and teleosts in reciprocally lost VGD ancientohnologs possessing by genome arrangement, allowing in the identification tetrapods of to orthologous genes teleosts bridges gar that show analyses Our models. biomedical teleost and tetrapods between bridge unique a provides lution, dense genetic map and ease of laboratory culture, this resource TGD. Because of gar’s phylogenetic position, slow rate of raysequence first evo the is Gar DISCUSSION genes. preduplication for expression of levels and patterns the the approximated usually and levels expression sum of their domains expression their of sum the that so evolved pairs by expected as gene, preduplication subfunctionalization quantitative the of level expression the approximately to evolve to tends pairs ohnolog for level expression ( ratios expression We next calculated average expression levels for each gene over over gene each for levels expression average calculated next We Taken together, our analyses indicate that, after the TGD, the after ohnolog that, Takenindicate together, our analyses - - derived derived ohnolog when both were retained and (iii) the averaged derived duplicates correlated significantly more closely with with closely more significantly correlated duplicates derived pte gr eoe t Ensembl, at genome gar Spotted 97 n.edu/syn - , 9 finned finned vertebrates, including human. Notably, the gar bridge 8 . . Gar genomic analyses show that formerlysequences con VOLUME 48 | NUMBER 4 | APRIL 2016 APRIL | 4 NUMBER | 48 VOLUME 2 2 l ; RepeatMasker, RepeatMasker, ; ( latus/ Supplementary Fig. 1 teny_db Fig. 6g Fig. Fig. Fig. 6g - finned fish genome sequence not affected by the the by affected not sequence genome fish finned Info/Inde , / h , h ; ; PhyloFish Portal, ). The ohnolog pair/gar expression ratios, ratios, expression pair/gar ohnolog The ). ). This finding suggests that the aggregate suggests ). This finding x 8 ; ; Synteny Database, ht Fig. 6e Fig. 9 . 89 tp://www.repeatmaske - , derived co derived 90 , 9 ) make gar a ray 6 , f . ); thus, ancestral gene sub gene ancestral thus, ); - - specific patterns of both both of patterns specific http://www.ensembl copy genes as estimated estimated as genes copy ht - tp://phylofish.sigen - tetrapod comparisons. orthologs ( orthologs

- Nature Ge Nature derived ohnologs, ohnologs, derived - htt - specific specific features, induced genome - p://syntenydb. finned species r.org Fig. 6e Fig. - n / finned finned . etics .org/ , ae. f ). ). ­ ­ ­ ­ ­ ­

© 2016 Nature America, Inc. All rights reserved. J.H., M.F., S.K. and P.F.S. studied miRNAs small RNA M.J.B., P.B., J.S. and J.H.P. annotated and analyzed miRNA genes on the basis of out the annotation and expression analysis of mineralization J.H.P. and J.A.Y. analyzed immune genes. K.K., M.I., M.M., P.B. and I.B. carried and J.S.T. analyzed opsin genes; and D.W., G.W.L., R.T.L., T.O., C.T.A.,N.R.S., C.C. analyzed Aldh genes; Y.S. and H.W. analyzed circadian clock genes; F.E.G.B. data. K.J.M. and P.W.H.H. analyzed Hox genes; J.F.M. analyzed ParaHox genes; data. karyotype J.J.S., I.B., J.S., J.H.L., J.C. and J.H.P. syntenyanalyzed conserved and A.M. performed phylogenomic and gene relative rate analyses. A.A. generated and B.A. annotated the genome. D.C., S.F., J. transcriptomes of gar, andmedaka bowfin, zebrafish. M.S.C., M.Y., D.B., S.M.J.S. Institute Genomics Platform. J.P., Y.G. and J.B. generated PhyloFish RNA A.A. prepared gar RNA for transcriptome sequencing and assembly by the Broad genome assembly and anchoring. A.A. and J.C. developed the gar genetic map. sequencing, and L.W. prepared libraries for genome sequencing. A.M.B. performed for genome and transcriptome sequencing. M.L. prepared DNA for genome management and coordination. Q.F. and A.F. provided gar and bowfin samples of genome sequencing management, and F.D.P. was in charge of overall project J.H.P., I.B., J.A. and K.L. (H.W.).(31030062) and (2012CB947600) the National Natural FoundationScience of China (NSFC) (C.T.A.); and the National Basic Research Program of China (973 Program) AI057559 (G.W.L. and J.A.Y.); US NIH grants R24 OD010922 and R01 GM079492 Generalitat de Catalunya, AGAUR (SGR2014 de Ciencia e Innovación (BFU2010 European Research Council grant 268513 (P.W.H.H. and K.J.M.); the Ministerio Agency for TechnologyScience, and Research (A*STAR), Singapore (B.V.); Laboratory (D.B., S.M.J.S. and B.A.); the Biomedical Research Council of the Trust (grants and andWT095908 WT098051) the European Molecular Biology Recherche (ANR) grant ANR of Education and Religious Affairs of Greece (T.M.); Agence Nationale de la National Strategic Reference Framework 2007–2013 (SPARCOMP, 36), Ministry Division (N.H.S.); NSF grant BCS0725227 (K.K.); call ‘ARISTEIA I’ of the (I.S.); the Brinson Foundation and the University of Chicago Biological Sciences and Technological Development (CNPq) grants 402754/2012 Laboratory Research Award 2014 (T.N.); Brazilian National Council for Scientific of Science Postdoctoral Research Fellowship 2012 Memorial Foundation Research Fellowship 2013, Japan Society for the Promotion (NSF) Doctoral Dissertation Improvement Grant 1311436 (A.R.G.); Uehara I/84 815 (I.B.); US NIH grant T32 HD055164 and National Science Foundation Foundation and the Volkswagen Foundation Initiative Evolutionary Biology, grantOD01119004 (J.H.P.); a Feodor Lynen Fellowship from the Alexander von Humboldtfurther supported by US NIH grants R01 OD011116 (alias R01 RR020833) and R24 National Research Institute grant U54 HG03067. This work was and Harvard University was supported by US National Institutes of Health (NIH)/ The generation of gar sequences and assemblies by the Broad Institute of MIT husbandry. We thank J. Westlund for the design of the illustrations.species State University) and the University of Oregon Fish Facility for gar work and assembly. We thank the teams of the Bayousphere Research Laboratory (Nicholls gar DNA and RNA libraries and J. Turner We thank the Broad Institute Genomics Platform for constructing and sequencing online version of the pape Note: Any Supplementary Information and Source Data files are available in the KU18927 accessions under GenBank from available are sequences gene Scpp medaka) and and bowfin gar, zebrafish, of transcriptomes (PhyloFish SRP04201 accessions under (SRA) Archive Read Sequence the from available accession under GenBank codes. Accession the in versio available are references associated any and Methods M Nature Ge Nature A.P.L. and B.V. outcarried CNE analyses for developmental gene loci. I.B., P.B., AUTHOR CONTRIBUTIONS A cknowledgments ethods SRP06394 n of the pape the of n - 4 seq seq data generated by T.D. (main text and 3 – (Broad Institute gar transcriptome), n KU18930 etics 2 (gar small RNA small (gar The spotted gar genome assembly is available from is available gar assembly genome The spotted

r - . T. planned and oversaw the project. J.J. was in charge r VOLUME 48 | NUMBER 4 | APRIL 2016 APRIL | 4 NUMBER | 48 VOLUME . 0 . - 10 - GENM GCA_0 - 14875 14875 and BFU2015 - - - seq for miRNA annotation). Gar Gar annotation). miRNA for seq 017 017 (PhyloFish; J.B.); the Wellcome in silico Maier for the gar transcriptome - 00242695. N.V. and A.M. analyzed TEs. T.M. - 290) 290) (C.C.); US NIH grant R01 - 127 and Marine Biological ( Supplementary Supplementary Note Supplementary Supplementary Note SRP04478 - 1 71340) 71340) and the RNA . - - 3 and 477658/2012 related genes. T.D., - 1 seq data are are data seq –

SRP04478

). ). V.R., - seq seq

). ). online online

-

1 4

5. 4. 3. 2. 1. visit license, licens this of copy a view To the material. the under reproduce included not is material to holder the license the from permissionobtain to ifneed will users license, Commons Creative line; credit the in otherwise indicated unless reprints/index.ht at online available is information permissions and Reprints The authors declare no competing interests.financial manuscript with input from other authors. gene expression analysis of gar, andmedaka zebrafish. I.B. and J.H.P. wrote the enhancer functions. J.P., I.B., P.B., J.H.P., Y.G. and J.B. performed comparative analyzed limb enhancer evolution. T.N.,A.R.G., I.S. and N.H.S. analyzed HoxD J.S. and J.H.P. performed whole 24. 23. 22. 21. 20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 10. 9. 8. 7. 6. 26. 25. COMPETING FINANCIAL INTERESTS FINANCIAL COMPETING

es/by-nc-sa/3.0 reference habitats. reference ( killifish Atlantic of populations melanoma. of function. gene coding ancient vertebrate genome duplications. genome vertebrate ancient vertebrate. PGE2. with (2014). 220–226 numbers cell stem hematopoietic boosting granuloma. tuberculous vertebrate. short-lived disease. human for models mutant ecotoxicology. in (18 April 2013). April (18 Curr. PLoS in USA Sci. Acad. evolution Natl. Proc. and diversity mobilome highlights vertebrates. elements transposable of (2011). 491 management tool for second-generation genome projects. Evo-Devo. vertebrate of (2011). data. sequence parallel massively duplication.genometeleost the foroutgroup evolutionandmeiotic mapsmassivelyby parallel DNAsequencing: spotted gar, an Evol. Mol. J. fish. of teleost diversification the with correlates duplication genome fish-specific evolution. gnathostome evolution. tetrapod (2003). 382–390 fish. ray-finned of species 22000 by shared trait a duplication, 282 Toxicol.Pharmacol. C ( (2014). 2921–2923 (2000). Lee, O., Green, J.M. & Tyler, C.R. Transgenic fish systems and their application their and systems fish TransgenicTyler, C.R. & J.M. Green, O., Lee, A.M. Reitzel, models fish analyzing and Generating M. Schartl, & M.E. Mathers, E.E., Patton, Kettleborough, R.N. J.S. Nelson, Chalopin, D., Naville, M., Plard, F.,Plard, M., Naville, D., Chalopin, Volff,analysis & Comparative D. J.N. Galiana, genome-database and pipeline annotation an MAKER2: M. Yandell, & C. Holt, I. Braasch, S. Gnerre, Genome J.H. Postlethwait, & Q. Fontenot, A., Ferrara, J., Catchen, A., Amores, the of timing Phylogenetic A. Meyer, & J.S. Taylor, H., Brinkmann, S., Hoegg, B. Venkatesh, C.T. Amemiya, Genome Y. Peer, de Van & A. Meyer, T., Frickey, I., Braasch, J.S., Taylor, A. Amores, I. Braasch, S.R. Frankenberg, is. it think you where not Robustness—it’s K. Wolfe, of resolution improves map meiotic lamprey sea The M.C. Keinath, & J.J. Smith, ancestral the in duplication genome whole of Tworounds J.L. P. Boore, Dehal, & the in immunopathology and Immunity Hagedorn, E.J., Durand, E.M., Fast, E.M. & Zon, L.I. Getting L. more for your marrow: Ramakrishnan, & A.J. Pagán, Harel, I. Evolutionary J.H. Postlethwait, & III H.W. Detrich, W., Cresko, R.C., Albertson, eacrR R. Betancur-R, Near, T.J. Prrx , 1711–1714 (1998). 1711–1714 , ) genes in spotted gar, basal teleosts, and tetrapods. et al. et al. This work is licensed under a Creative Commons Attribution-NonCommercial- n hs ril ae nldd n h atce Cetv Cmos license, Commons Creative articles the in included are article this in ShareAlikeUnported3.0 License. imagesTheotherorthird party material m PLoS Biol. PLoS et al. et et al. et t al. et

t al. et Genome Biol. Evol. Biol. Genome

l Fishes of the World the of Fishes . 59 A platform for rapid exploration of aging and diseases in a naturally et al. et / doi:10.1371/curre Resolution of ray-finned fish phylogeny and timing of diversification. Methods Cell Biol. Cell Methods . t al. et t al. et , 190–203 (2004). 190–203 , t al. et Zebrafish A new model army: emerging fish models to study the genomics the study to models fish emerging army: model new A Crit. Rev. Toxicol.Rev. Crit. ihqaiy rf asmle o mmain eoe from genomes mammalian of assemblies draft High-quality onciiy f etbae eoe: ardrltd homeobox paired-related genomes: vertebrate of Connectivity t al. et Genetic variation at aryl hydrocarbon receptor ( receptor hydrocarbon aryl at variation Genetic BMC Evol. Biol. Evol. BMC Nature et al. The tree of life and a new classification of bony fishes. bony of classification new a and life of tree The

lpat hr gnm poie uiu isgt into insights unique provides genome shark Elephant

h Arcn olcnh eoe rvds nihs into insights provides genome coelacanth African The Cell 163 Nature 3 Nature J. Exp. Zool. B Mol. Dev. Evol. Dev. Mol. B Zool. Exp. J. , e314 (2005). e314 , Cold Spring Harb. Perspect. Med. Perspect. Harb. Spring Cold h PUe o gn nomenclature. gene of POU-er The A systematic genome-wide analysis of zebrafish protein-

- , 24–36 (2014). 24–36 , hox 496 160

genome alignments and CNE global analyses. I.B. 109

496 clusters and vertebrate genome evolution. genome vertebrate and clusters

, 311–316 (2013). 311–316 , , 1013–1026 (2015). 1013–1026 , 505

, 13698–13703 (2012). 13698–13703 , 7 4th edn (John Wiley,2006). (John edn 4th nts.tol.53ba26640df0

udls heteroclitus Fundulus , 567–580 (2015). 567–580 , 105 , 494–497 (2013). 494–497 , Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc.

, 174–179 (2014). 174–179 , 45 TrendsGenet. 14 , 339–366 (2011). 339–366 , , 124–141 (2015). 124–141 , , 6 (2014). 6 , Genome Res. Genome Genetics

25

, 74–81 (2009). 74–81 , http://creati 188

ccaee75bb165c8c2628 ihbtn plue and polluted inhabiting )

25 Comp. Biochem. Physiol. 324 http://www.nature.c BMC Bioinformatics s e l c i t r A , 1081–1090 (2015). 1081–1090 , , 799–808 (2011).799–808 ,

a. Genet. Nat. x. el Res. Cell Exp. 5 , 316–341 (2015). 316–341 , , a018499 (2015). a018499 ,

Development 108 Genome Res. Genome vecommons.org/ , 1513–1518 , AHR

25 ) loci in loci ) Science

3–4 ,

435 329 141

om/ 12 13 8 , , , ,

© 2016 Nature America, Inc. All rights reserved. 57. 56. 55. 54. 53. 52. 51. 50. 49. 48. 47. 46. 45. 44. 43. 42. 41. 40. 39. 38. 37. 36. 35. 34. 33. 32. 31. 30. 29. 28. 27. s e l c i t r A 436

ytm gntc vns n slcie pressures. (2010). selective and events genetic system: zebrafish. in genes I class complex zebrafish. of Biol. Evol. BMC vertebrates. of 13 part large a in system DM peptide-loading of the dispensability reveals genomes fish teleost in genes II class MHC of analysis fishes. bony in loci II zebrafish, (1997). the in groups linkage gland. pineal (1999). zebrafish the genome in vertebrate early of rounds two the in duplications. region genomic shared their α 80 genes. duplicated for fates distinct reveals analysis comparative a fish. ray-finned in Evol. 5 ofsurviving paralogs: ALDH1A and retinoic acid signaling in vertebrate genomes. evolution onfunctional loss gene lineage-specific of Consequences development. mouse Dev. Genet. Opin. Curr. coelacanths. and genomes. zebrafish and pufferfish (2006). 75–82 of those with comparison and Evol. Biol. paddlefish—a American the representative basal ray-finned fish and important comparative reference. in paralogs Hox from inferred duplication genome Evol. Dev. Mol. B Zool. Exp. J. (2004). 820–828 mapping. gene comparative by revealed proto-chromosomes early in reorganization genome dynamic vertebrates. reveals genome ancestral vertebrate 85 and their remarkable similarity to those of one of the most ancient frogs. fishes. Dev. Genet. Opin. Curr. land. to water from radiation. vertebrate largest the across Life for Struggle the in 1859). Races Favoured of Preservation the or biodiversity. of resurrection The anatomy.Holostei. skeletal on of mostly based sequencing species, related closely upon targeted based fishes ray-finned (UCEs). of elements ultraconserved radiation the on evolution. fish 2013). April bony (16 of Curr. tempo and pattern PLoS the reveals analysis phylogenetic Rennison, D.J., Owens, G.L. & Taylor,& G.L. Owens, D.J., Rennison, divergence and duplication gene Opsin J.S. genomes. fish teleost in genes period of analysis Comparative H. Wang, J.H. Postlethwait, & H. Yokoi, A., Rodríguez-Marí, J.M., Catchen, C., Cañestro, during differentiation cell progenitor of control in signaling Retinoid G. Duester, Vertebrate D. Duboule, Mulley,P.W.J.F.Holland, of & retention Parallel Kurosawa, G. gonemissing. Crow, K.D., Smith, ohnologs C.D., Cheng, J.F., in context: Wagner, G.P. & Amemiya, genome C.T. An independent zebrafish The J.H. Postlethwait, K. Naruse, the of Reconstruction S. Morishita, & Y. Kohara, H., Takeda, Y., Nakatani, Bogart, J.P., Balon, E.K. & Bruton, M.N. The chromosomes of the living coelacanth S. Ohno, diversity. teleost and genomes fish evolving Rapidly B. Venkatesh, & V. Ravi, Nikaido, M. Rabosky, D.L. () fish ray-finned C. of origins Darwin, the in issues Major L.C. Sallan, and (Lepisosteiformes) gars of study pattern synthetic empirical An L. Grande, Faircloth, B.C., Sorenson, L., Santini, F. & Alfaro, M.E. A phylogenomic perspective Multi- G. Ortí, & G. Arratia, C., Li, R., Betancur-R, R.E., Broughton, lji, .. Kshr, . rgn n eouin f h aatv immune adaptive the of evolution and Origin M. Kasahara, & M.F. Flajnik, Dirscherl, H. & Yoder, J.A. Characterization of the Z lineage major histocompatability Yoder,S.C., McConnell, H., Dirscherl, genes I class MHC The J.L. Jong, de & J.A. U. Grimholt, Dijkstra, J.M., Grimholt, U., Leong, J., Koop, B.F. & Hashimoto, K. Comprehensive A. Sato, Bingulac-Popovic, J. expressed rhodopsin novel a Exo-rhodopsin: Y. Fukada, & D. Kojima, H., Mano, D. Lagman, fish: teleost of clock circadian The J.C. Opazo, Toloza-Villalobos,& J.I. Arroyo, J., , e1000496 (2009). e1000496 , subunits and oxytocin/vasopressin receptors was established by duplication of duplication by established was receptors oxytocin/vasopressin and subunits , 260 (2013). 260 , (2015). 57–64 , (1994). 322–325 ,

67 Chromosoma , 29–40 (2008). 29–40 , et al. et et al. et Copeia

4 et al.

et al. et Genome Res. Genome , 937–953 (2012). 937–953 , Biol. Rev. Camb. Philos. Soc. Philos. Camb. Rev. Biol. t al. et Dev. Comp. Immunol. Comp. Dev. et al. et n h Oii o Seis y en o Ntrl Selection, Natural of Means by Species of Origin the On et al. BMC Evol. Biol. Evol. BMC et al. Nonlinkage of major histocompatibility complex class I and class and I class complex histocompatibility major of Nonlinkage Microchromosomes in holocephalian, chondrostean and holostean and chondrostean holocephalian, in Microchromosomes doi:10.1371/curre

15 Coelacanth genomes reveal signatures for evolutionary transition 10 The vertebrate ancestral repertoire of visual opsins, transducin opsins, visual of repertoire ancestral vertebrate The Mol. Biol. Evol. Biol. Mol. Mol. Phylogenet. Evol. Phylogenet. Mol. A comprehensive analysis of teleost MHC class I sequences. I class MHC teleost of analysis comprehensive A Organization and structure of mdk gn mp te rc o acsrl vertebrate ancestral of trace the map: gene medaka A Rates of speciation and morphological evolution are correlated , 32 (2015). 32 ,

(supplementary issue 2A), 1–863 (2010). 1–863 2A), issue (supplementary Genome Res. Genome et al. 26 Semin. Cell Dev. Biol. Dev. Cell Semin. Immunogenetics

, 35–40 (1969). 35–40 , 8 18 Mapping of MHC class I and class II regions to different

, 514–518 (1998). 514–518 , hox 17 , 544–550 (2008). 544–550 , , 1254–1265 (2007). 1254–1265 ,

ee euain cutrn ado colinearity? and/or clustering regulation: gene 13

308 nts.tol.2ca8041495ff , 238 (2013). 238 ,

PLoS One PLoS ri Rs Ml Ban Res. Brain Mol. Res. Brain 27 23

ai rerio Danio 46 , 563–577 (2007). 563–577 , Immunogenetics , 2386–2391 (2010). 2386–2391 , , 1740–1748 (2013). 1740–1748 , , 11–23 (2014). 11–23 ,

Nat. Commun. Nat. 51

62

, 108–116 (2000). 108–116 , 89

8 24 , 986–1008 (2012). 986–1008 , , e65923 (2013). e65923 , , 950–971 (2014). 950–971 , . , 694–700 (2013). 694–700 , hox Immunogenetics Pdx2 a. e. Genet. Rev. Nat. gene loci in medaka genome

66 genes in cartilaginous fish cartilaginous in genes

afd0c92756e75247483 4 , 185–198 (2014). 185–198 , , 1958 (2013). 1958 , eoe Res. Genome M Eo. Biol. Evol. BMC

Jh Murray, (John 73 46 J. Mol. Evol. Mol. J.

PLoS Genet. 110–118 , 129–134 , 11 Gene J. Hered. 47–59 , Genome J. Mol. J.

370

14 e , ,

69. 68. 67. 66. 65. 64. 63. 62. 61. 60. 59. 58. 91. 85. 84. 83. 82. 81. 80. 79. 78. 77. 76. 75. 74. 73. 72. 71. 70. 90. 89. 88. 87. 86.

hnig eei bss f etbae teeth. vertebrate of basis genetic changing Immunol. recognition. allogeneic (2008). mediates family multigene USA Sci. genes. TCR immunity. fish. teleost in pathway developmental cell B distinctive a for implications trout: rainbow in (IgT) isotype formation. of scales (2014). sarcopterygians. by shared genes gnathostomes. in Callorhinchus tissues. mineralized in (2005). 18063–18068 ofapreviously and expression Z. immunoglobulin isotype, identification unknown zebrafish: in locus heavy-chain Genet. partitioning, the teleost radiation and the annotation of the human genome. (1999). 169–181 mutations. limbs. mouse in collinearity enhancers. to Dev. Genet. Opin. Curr. USA Sci. Acad. Natl. Proc. (2007). D88–D92 enhancers. human tissue-specific of database Browser—a (2013). of regulation digits. sequences. development. vertebrate (2009). 50–68 (2011). 203–215 emergence. and modification, processing, their into insights . in RNAs species. of 2012). (Springer, 341–383 17, Ch. D.E.) Soltis, P.S.& (2011). data. deep-sequencing RNAsequencing. and annotation onsmall based microRNAs zebrafish Gene of annotation the (Basel) Life characters. China Sci. Mater. Front. gars, in enamel collar the of development Sire, J.Y. Light and TEM study of nonregenerated and experimentally regenerated experimentally and nonregenerated of study J.Y.TEM Sire, and Light Kawasaki, K. & Amemiya, C.T. SCPP genes in the coelacanth: tissue mineralization of genome The P.E. Ahlberg, & P. Tafforeau, T., Haitina, S., Sanchez, B., Ryll, Kawasaki, K. The SCPP gene repertoire in bony vertebrates and graded differences the evolution: in drift Phenogenetic K.M. Weiss, & T. Suzuki, K., Kawasaki, fish. bony in NITRs of phylogenetics and function Form, Yoder,J.A. J.P. Cannon, J.C. Roach, dynamic The R.D. Miller,M.F. & M.F.,Flajnik, Y., Criscitiello, Ohta, Z.E., Parra, Y.A. Zhang, heavy-chain Ig unique a of Discovery R.B. Phillips, & E.D. Landis, J.D., Hansen, immunoglobulin The L.A. Steiner, & K. Jekosch, J., Bussmann, N., Danilova, otehat J, mrs A, rso W, igr A & a, .. Subfunction Y.L. Yan, & A. Singer, W., Cresko, A., Amores, J., Postlethwait, A.R. Gehrke, Enhancer VISTA L.A. Pennacchio, & I. Dubchak, S., Minovitsky, A., Visel, S. Berlivet, T. Montavon, L.A. Pennacchio, A. Woolfe, B.M. Wheeler, Berezikov, E. A. Grimson, microRNA Loh, Y.H., Yi, S.V. & Streelman, J.T. integrating Evolution of microRNAs and the diversification miRBase: in J.H. Postlethwait, & S. I. Braasch, Griffiths-Jones, & A. Kozomara, Expanding J.H. Postlethwait, & J. P.,Sydes, Batzel, M.J., Beam, T., Desvignes, revisited. families microRNA of expansion P.F. Stadler, The & J. Hertel, M. Zhu, and structure Fine M. Mikami, & H. Yokosuka, M., Ishiyama, I., Sasagawa, Stoltzfus, A. On the possibility of constructive neutral evolution. neutral constructive of possibility the On A. Stoltzfus, A. Force, G. Andrey, expeditions from limb: tetrapod the of origin The N.H. Shubin, & I. Schneider, Zakany, J. & Duboule, D. The role of δ

TCR : 546

Cell Eur. J. Immunol. J. Eur. 20 VOLUME 48 | NUMBER 4 | APRIL 2016 APRIL | 4 NUMBER | 48 VOLUME

t al. et

et al. et 102 , 386–389 (2014). 386–389 , , 481–490 (2004). 481–490 , 33 eiotu oculatus Nat. Immunol. Nat.

Anat. Rec. Anat. δ Genetics

Genome Biol. Evol. Biol. Genome 147 t al. et Nature t al. et Nature et al. et his n h the in chains et al. et 5 et al. et , 135–144 (2009). 135–144 , et al. and the fossil record: a new perspective on SCPP gene evolution gene SCPP on perspective new a record: fossil the and t al. et , 905–920 (2015). 905–920 , et al. et , 9577–9582 (2005). 9577–9582 , TrendsGenet. h ods atcltd sectyn msi gnathostome mosaic reveals osteichthyan articulated oldest The t al. et HoxA et al. et Preservation of duplicate genes by complementary, degenerative complementary, by genes duplicate of Preservation t al. et , 1132–1145 (2011). 1132–1145 , The evolution of vertebrate Toll-likevertebrate of evolution The receptors. sic bten oooia dmis underlies domains topological between switch A ihy osre nncdn sqecs r ascae with associated are sequences non-coding conserved Highly et al. et IgT, a primitive immunoglobulin class specialized in mucosal in specialized class immunoglobulin primitive a IgT, Early origins and evolution of microRNAs and Piwi-interacting and microRNAs of evolution and origins Early Evol. Dev. Evol. Deep annotation of

Nature

A regulatory archipelago controls archipelago regulatory A 458 444 Clustering of tissue-specific sub-TADs accompanies the accompanies sub-TADs tissue-specific of Clustering ee i dvlpn limbs. developing in genes

The deep evolution of metazoan microRNAs. metazoan of evolution deep The ep osrain f rs ad ii ehnes n fish. in enhancers digit and wrist of conservation Deep 151

A bony fish immunological receptor of the NITR of receptor immunological fish bony A 240

In vivo In , 469–474 (2009). 469–474 , Dev. Genes Evol. Genes Dev. , 499–502 (2006). 499–502 ,

Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. 17 PLoS Biol. PLoS , 1531–1545 (1999). 1531–1545 , 40

2

455 11 , 134–142 (2008). 134–142 , , 189–207 (1994). 189–207 , , 359–366 (2007). 359–366 ,

, 2319–2329 (2010). 2319–2329 ,

112

16 Science , 827–835 (2010). 827–835 , 29 enhancer analysis of human conserved non-coding conserved human of analysis enhancer , 1193–1197 (2008). 1193–1197 ,

, 123–124 (2014). 123–124 , Hlse) ih atclr teto t to attention particular with (Holostei) 3 , 419–426 (2013). 419–426 , , 803–808 (2015). 803–808 , , 55–65 (2011). 55–65 , J. Exp. Zool. B Mol. Dev. Evol. Dev. Mol. B Zool. Exp. J. Polyploidy and Genome Evolution Genome and Polyploidy Hox

3

340 , e7 (2005). e7 , Nat. Immunol. Nat. Drosophila melanogaster genes during vertebrate limb development. eou tropicalis Xenopus

219 , 1234167 (2013). 1234167 , uli Ais Res. Acids Nucleic eiotu oculatus Lepisosteus , 147–157 (2009). 147–157 , rc Nt. cd Si USA Sci. Acad. Natl. Proc.

LS Genet. PLoS 102

6 Hox

, 295–302 (2005). 295–302 , Immunity , 6919–6924 (2005). 6919–6924 , Nature Ge Nature tlz atbd-ie V antibody-like utilize Nucleic Acids Res. Acids Nucleic genes transcription in transcription genes

microRNAs yields eoe Res. Genome 39 Proc. Natl. Acad. Natl. Proc. J. Mol. Evol. Mol. J.

Actinopterygii. ,

322 29 D152–D157 , 9 Evol. Dev. Evol. e1004018 , (eds. Soltis, (eds. HoxD 228–237 , Dev. Comp. Dev. , 390–402 , n etics genes Trends

102

35 11 21 49 , , , , ,

© 2016 Nature America, Inc. All rights reserved. Oxford, Oxford, UK. Molecular Biology Laboratory, European Institute, Bioinformatics Wellcome Trust Genome Campus, Hinxton, UK. of Human Genetics, University of Utah, Salt Lake City, Utah, USA. Animal Biology, University of Illinois, Illinois, USA. Urbana-Champaign, Recherche Agronomique (INRA), UR1037 Laboratoire de Physiologie et Génomique des Poissons (LPGP), Campus de Beaulieu, Rennes, France. USA. Pennsylvania, USA. 1 North North Carolina, USA. 16 for Science, Technology and Research (A*STAR), Singapore. J.H.P. ( Vertebrate and Health Genomics, Genome Analysis Center, Norwich, UK (F.D.P.). should Correspondence be addressed to I.B. ( Department of Proteomics, Helmholtz Centre for Research Environmental (UFZ), Leipzig, Germany (J.H.), ecSeq Leipzig, Bioinformatics, Germany (M.F.) and (D.C.), Department of Genetics, University of USA Pennsylvania, Philadelphia, Pennsylvania, (S.F.), Young Group Investigators and Bioinformatics Transcriptomics, Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK (K.J.M.), Department of Genetics, University of Georgia, Athens, Georgia, USA Integrative Biology, Michigan State University, East Lansing, Michigan, USA (I.B.), Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA (M.S.C.), 33 Universidade Federal do Pará, Belem, Brazil. Institute, Seattle, Washington, USA. of Evolutionary Studies of Biosystems, SOKENDAI (Graduate University for Advanced Studies), Hayama, Japan. of Medicine, St. Petersburg, Florida, USA. 25 Sciences, Medical College, Soochow University, Suzhou, China. University of Victoria, Victoria, British Columbia, Canada. Genètica, Universitat de Barcelona, Barcelona, Spain. 95. 94. 93. 92. Nature Ge Nature Institute Institute of University Neuroscience, of Oregon, Eugene, Oregon, USA. Science Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden. Department of Dental Hygiene, Nippon Dental University College at Niigata, Niigata, Japan. Department of Biology, University of Konstanz, Konstanz, Germany.

(2009). conserved synteny after whole-genome duplication. vertebrates. in trees phylogenetic evolution. gene (2005). 1157–1164 duplicate in neofunctionalization substantial Ohno, S. Ohno, of identification Automated J.H. Postlethwait, & J.S. Conery, J.M., Catchen, A.J. Vilella, and prolonged by accompanied subfunctionalization Rapid J. Zhang, & X. He, 3 Department Department of Biology, University of Kentucky, Lexington, Kentucky, USA. [email protected] 13 n School School of Biological Sciences, Bangor University, Bangor, UK. Evolution by Gene Duplication Gene by Evolution etics t al. et 5 Institute Institute of Marine Biology, and Biotechnology Aquaculture, Hellenic Centre for Marine Research, Heraklion, Greece. 18

Center Center for Comparative Medicine and Translational Research, North Carolina State University, Raleigh, North Carolina, USA. nebCmaa eere: opee duplication-aware complete, GeneTrees: EnsemblCompara VOLUME 48 | NUMBER 4 | APRIL 2016 APRIL | 4 NUMBER | 48 VOLUME 30 ). Genome Res. Genome Department Department of Biological Sciences, Nicholls State University, Thibodaux, Louisiana, USA. 27 (Springer-Verlag, 1970). (Springer-Verlag, Department Department of Microbiology, Nippon Dental University School of Life Dentistry at Niigata, Niigata, Japan. 32 International Max International Planck Research School for Organismal Biology, University of Konstanz, Konstanz, Germany.

19 Genome Res. , 327–335 (2009). 327–335 , 20 Institut Institut de Recerca de la Universitat Biodiversitat, de Barcelona, Barcelona, Spain. 22 Center Center for Circadian Clocks, Soochow University, Suzhou, China. 15 Institut Institut de Génomique de Fonctionnelle Lyon, Ecole Normale Supérieure de Lyon, Lyon, France. 24

Bioinformatics Group, Bioinformatics Department of Computer Science, Universität Leipzig, Leipzig, Germany. 19 10 Genetics

, 1497–1505 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. 17 Department Department of Molecular Biomedical Sciences, North Carolina State University, Raleigh, 8 2 Broad Broad Institute of MIT and Harvard, Cambridge, USA. Massachusetts, Department Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois,

169 14 Comparative Comparative Genomics Laboratory, Institute of Molecular and Cell Biology, Agency , 4 Department Department of Anthropology, Pennsylvania State University, University Park, 99. 98. 97. 96.

genomics. genomics. system. embryos. rlne pro o aymti eouin olw ee ulcto i yeast. in duplication gene follow evolution Res. Genome asymmetric of period prolonged Krzywinski, M. M. Krzywinski, W.Y. Hwang, N. Chang, a and evolution sequence protein of burst A K.H. Wolfe, & D.R. Scannell, 26 Department Department of Pediatrics, University of South Florida Morsani College Nat. Biotechnol. Nat. Cell Res. Cell t al. et Genome Res. Genome

t al. et 18 eoe dtn wt RAgie Cs nces i zebrafish in nuclease Cas9 RNA-guided with editing Genome , 137–147 (2008). 137–147 , t al. et

Efficient genome editing in zebrafish using aCRISPR-Cas using inzebrafish editing genome Efficient 23 , 465–472 (2013). 465–472 , 29 12 Circos: an information aesthetic for comparative comparative for aesthetic information an Circos: Molecular Molecular Genetics Program, Benaroya Research

31

Department Department of Zoology, University of Oxford, 19 , 227–229 (2013). 227–229 , , 1639–1645 (2009). 1639–1645 , 34 23 Present Present addresses: Department of [email protected] School School of Biology and Basic Medical 31 Instituto Instituto de Ciências Biológicas, 6 Institut Institut National de la 21 Department Department of Biology, 9 s e l c i t r A Eccles Eccles Institute ) ) or 7 Department Department of 19

11 Departament Departament de 28 European European Department Department 437

© 2016 Nature America, Inc. All rights reserved. Gene family analyses. family Gene refs. in described as derived maps synteny comparative (iii) and 74); version (Ensembl assembly genome gar the of integration after the in described as and 75 plots Circos (i) with ( teleosts and chicken) and (human zebrafish for described as ( established cultures cell fibroblast fin caudal analyses. structure Genome two with Tajima’s tests rate relative ( RAxML with out carried Western evolving turtle slowly painted ( bowfin and gar from sequences gous analysis genome coelacanth the for described analyses. rate ( evolutionary and Phylogenomic ( sequence TE consensus to TE corresponding the copies of individual distances Kimura the using the RepeatMasker. genome The with TE age was determined profile ( library single a into the nomenclature Wicker’s on of performed basis was TEs gar of RepeatModeler) and RepeatScout (using elements. of transposable Annotation likely has close to the true number of gar protein respectively. The gene set with 21,443 high Note scripts along with 42 pseudogenes and 2,595 noncoding RNAs ( gene annotation pipeline identified 18,328 protein ( SwissProt(ref.proteins,MAKER2 transcriptomes ( Genome annotation. package Velvet/Oases the using assembled and medaka) intestine) and one stage embryonic (‘pigmented eye’ stage of gar, zebrafish and and bone kidney, liver, muscle, heart, gills, brain, testis, (ovary, tissues adult ( medaka and zebrafish fin, Trinity using assembled 8 ( transcriptomes. RNA-seq ( chromonome assembly the in of bases or 94.2% site–associated DNA (RAD) tag markers were anchored in 29 linkage groups using 2,153 of 8,406 meiotic map restriction those of other vertebrate Illumina genome assemblies size of 68.3 kb, a scaffold N50 size of 6.9 Mb and quality comparable metrics to plus gaps contigs. The between spotted gar genome assembly has a contig N50 sequence of Mb 869 of composed is and size in Mb 945 is assembly draft The into LepOcu1 (GenBank accession assembled and coverage 90× to libraries jumping and technology sequencing Louisiana ( Chevreuil, and assembled using DNA from a single adult female gar wild Gar genome sequencing and assembly. 12 IACUC protocol A Number Assurance Welfare (Animal Committee Use and Care the in found Animal of Institutional approved work by was Oregon University the Animal be can methods of description full A ONLINE Nature Ge Nature the SupplementaryNote Supplementary NoteSupplementary Supplementary Note Supplementary Supplementary Note Supplementary Supplementary Note Supplementary - day larvae, eye, liver, heart, skin, muscle, kidney, brain and testis) and and testis) and brain kidney, muscle, skin, heart, liver, eye, larvae, day Supplementary Supplementary Note ). Annotations for 762 and 6,877 genes are specific to Ensembl and MAKER, - cluster tests cluster

Supplementary Note Supplementary METHODS n etics Supplementary Note - 9 02RA). Using evidence from the Broad Institute and PhyloFish gar 9 10 Supplementary Note Supplementary ). Using the Broad InstituteBroadUsingthe). transcriptome, Ensembl the Supplementary Note Supplementary ). ). Analyses of synteny conserved gar,between tetr on the basis of orthology relationships from Ensembl Ensembl from relationships orthology of basis the on Individual gene families were analyzed as described in as were described analyzed gene families Individual ) was generated from ten tissues (stage 28 embryo 28 (stage tissues ten from generated was ) ) were based on protein on based were ) ) were performed at the protein alignment level with with level alignment protein the at performed were ) 8 10 . . . RT 10 10 1 5 Supplementary Note Supplementary The Broad Institute gar RNA gar Institute Broad The . PhyloFish RNA PhyloFish . 7 and PhyloBayes MPI PhyloBayes and and at the level of the reconstructed phylogenies phylogenies reconstructed of the level at and the Supplementary Note Supplementary The from was spotted gar determined karyotype - PCR PCR and sequencing was performed to annotate 10 ). 3 AHAT00000000. 2 , and identified elements were combined combined were elements identified and , 3 ), all RefSeq teleost proteins and all UniProt/ Supplementary Note Supplementary ) annotated 25,645 proteinannotated25,645 ) 10 The spotted gar genome was sequenced - Manual and automated classification classification automated and Manual Supplementary Note Supplementary 2 confidence genes predicted by MAKER 4 0 . Phylogenetic reconstructions were reconstructions . Phylogenetic , , thus capturing 891 Mb of sequence ). ). It was using sequenced Illumina - ), which was then used to mask mask to used then was which ), seq transcriptomes of gar, bow gar, of transcriptomes seq - - 1 coding genes. coding sequence alignments alignments sequence coding - 7 ; (ii) the Synteny Database Synteny the (ii) ; Supplementary Note Supplementary coding genes for 22,470 tran but updated with ortholo with updated but 10 ) were generated from ten ten from generated were ) 1 6 ) using ALLPATHS 2 . Molecular rate analyses analyses rate Molecular . Phylogenetic analyses analyses Phylogenetic 1 Supplementary Note Supplementary . . A total of 209 scaffolds 17 10 - 2 , seq transcriptome transcriptome seq ) were performed performed were ) 11 . - caught in Bayou 0 Supplementary ) and from the the from and ) . - coding genes coding - 3009 ). - a LG pods pods - 10 01, 01, 10 2 9 0 1 4 9 ­ ­ ­ , . .

109. 108. 107. 106. 105. 104. 103. Prost! (ref. with and annotated and were processed ovary, testis which heart, brain, adult zebrafish for described as performed ( miRNAs gar of data sequencing the on based RNAfold with firmed miRBase from miRNAs ( analysis. and annotation miRNA ( scales and jaw teeth, gar from libraries and analyze gene expression of Scpp mineralization 102. 101. 100. BEDtools whole from established was to gar through zebrafish to human from SNPs GWAS Genome embedded and curation. CNEs of synteny conserved and alignments CNEs. obtain to out Evolution offiltered human were limb enhancers sequences repetitive and elements genic of fourfold We phyloFit used alignments. were generated with MultiZ Note mental gene loci was performed using SLAGAN Analysis of conserved noncoding elements. Database Synteny the Ensembl using established were synteny conserved on based orthologies for the detection of neofunctionalization and subfunctionalization. and neofunctionalization of detection the for criteria and expression pair of ohnolog definition the including information, (ref. R using determined its gar and ortholog of singletons, ohnolog 1, ohnolog 2 and ohnolog pairs was and gene medaka or zebrafish each between expression of levels relative and DESeq using tissues 11 the across with gene each of BWAsequence coding reference Ensembl longest the against ( transcriptomes PhyloFish the from Database GeneTrees Compara Ensembl from information phylogenetic were integrating by (co)orthologs generated gar their and medaka and zebrafish of singletons TGD analyses. expression gene Comparative ( respectively zebrafish for vectors LacZ pXIG into cloned were elements CNS65 VISTAwith were CNS65 identified (LAGAN) enhancer HoxD enhancer functional analysis. upeetr Note Supplementary

). ). Gar - Methods Cell Biol. Cell Methods trees. linearized and Genetics environment. Biol. inaparallel of profiles mixtures infinite with reconstruction phylogenies. large of 14 lineage. evolving slowly a in adaptations physiological extreme of elements. 28 levels. expression of range dynamic the across assembly RNA-seq genome. reference a without Lepisosteus osseus Lepisosteus Amores, A. & Postlethwait, J.H. Banded chromosomes and the zebrafish karyotype. clock molecular the of test Phylogenetic M. Nei, & A. Rzhetsky, N., Takezaki, Tajima, F. Simple methods for testing the molecular evolutionary clock hypothesis. phylogenetic MPI: PhyloBayes Richer,J. & D. Stubbs, N., Rodrigue, N., Lartillot, Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis Shaffer,H.B. T. Wicker, cuz MH, ebn, .. Vnrn M & iny E Oss robust Oases: E. Birney, & M. Vingron, D.R., Zerbino, M.H., Schulz, gar, longnose the M.G. of Grabherr, stages embryonic Normal W.W. Ballard, & W.L. Long, Bowtie , R28 (2013). R28 , (2012). 1086–1092 ,

9 12 62 4 - - 11 9 ( degenerate degenerate sites to elements identify conserved with phastCons 6 , , zebrafish 3 ( , 611–615 (2013). 611–615 , (Ensembl 74) and conserved synteny data from the Synteny Synteny the from data synteny conserved and 74) (Ensembl

upeetr Note Supplementary 5 128 Supplementary Note Supplementary 135 Supplementary Note Supplementary ) according to miRNA guidelines Nat. Rev. Genet. Rev. Nat. t al. et , 12 , 599–607 (1993). 599–607 , et al. et 9 , counted with SAMtools with , counted t al. et A unified classification system for eukaryotic transposable foreukaryotic system classification unified A 9

- The Western painted turtle genome, a model for the evolution the for model a genome, turtle Westernpainted The 11 . 4 60 and human and Genomicus and BMC Dev. Biol. Dev. BMC ) by BLAST comparison of teleost and tetrapod tetrapod and teleost of comparison BLAST by ) 4 74 Mol. Biol. Evol. Biol. Mol. Bioinformatics (see also ref. ref. also (see ullnt tasrpoe seby rm N-e data RNA-Seq from assembly transcriptome Full-length , 323–338 (1999). 323–338 , , 13 111– 12 12 7 2 and mouse (Cyagen Biosciences) transgenesis, transgenesis, Biosciences) (Cyagen mouse and 2

). See the the See ). 12 on the basis of lastZ 11 8 , 973–982 (2007). 973–982 , Nat. Biotechnol. Nat. 4 3 to generate a neutral model of the evolution evolution of the model a neutral to generate against the gar genome assembly and con and assembly genome gar the against - genome alignments using liftOver using alignments genome 13 33 - ). . o al he seis RNA species, three all For ). ). Gar and teleost orthologs of the HoxD early centric 13 centric 1 , 82– . The correlation of expression patterns patterns expression of correlation The . Gar miRNAs were studied studied were miRNAs Gar

30 1

7 upeetr Note Supplementary 12 7 3 , 6 (2001). 6 , 8 118 Supplementary Note Supplementary by using small RNA small using by Supplementary Note Supplementary 2 4 , 1312–1313 (2014). 1312–1313 , Curated lists of TGD ohnologs and and ohnologs TGD of lists Curated was established using whole , 823–833 (1995). 823–833 , ). miRNA annotation and analyses analyses and annotation miRNA ). - , cFos 11 Investigation of CNEs in develop 13 9 12 - 0 .

way way multi and normalized for each gene gene for each normalized and 29 0 - in VISTA Supplementary Note Supplementary eGFP and Gateway and eGFP , 644–652 (2011). 644–652 , 12 - related genes using cDNA 3 pairwise whole pairwise 12 - doi: genome alignments alignments genome - 1 12 wide connectivity connectivity wide . . Gar and zebrafish 1 ( 10.1038/ng.3526 ) were mapped mapped were ) Supplementary - ). for additional additional for seq data from from data seq Bioinformatics Genome Biol. Genome 11 - seq reads reads seq 6 - ; miRNA - - n silico in genome genome genome Hsp68 12 e novo de ) were were ) 5 and and Syst. 11 12 7 4 ­ ­ - ; ,

© 2016 Nature America, Inc. All rights reserved. 120. 119. 118. 117. 116. 115. 114. 113. 112. 111. 110. doi: 10.1038/ng.3526

Bioinformatics eukaryota. (2013). in genomes. genomics ancestral and comparative modern in synteny gene study to Bioinformatics browser a and (2013). variants. (2015). sequence and pathways, biosynthetic analysis. experiment sequencing smallRNA generation next and annotation miRNA for tool (2011). (2004). Res. nomenclature. gene and targets sequences, microRNA miRBase: genomics. microRNA (2008). for tools (2013). evolution. vertebrate into insights provides genome rdo M. Brudno, for browsers genome five Genomicus: H. Crollius, Roest & M. Muffato, A., Louis, database a Genomicus: H. Crollius, Roest & C.E. Poisnel, A., Louis, M., Muffato, P. Flicek, T. Desvignes, a Prost!, J.H. Postlethwait, & B.F. Eames, J., Sydes, T., Desvignes, P., Batzel, R. Lorenz, Registry. microRNA The S. Griffiths-Jones, Enright,A.J. A.& S.,Bateman, vanDongen, R.J., A.J.miRBase: Grocock, S., Griffiths-Jones, S.&Enright, vanDongen, H.K., Saini, S., Griffiths-Jones, J.J. Smith,

34 , D140–D144 (2006). D140–D144 , Zenodo t al. et t al. et t al. et t al. et t al. et

19 26

doi:10.5281/zenodo.3546 (suppl. 1), i54–i62 (2003). i54–i62 1), (suppl. (2010). 1119–1121 , Glocal alignment: finding rearrangements during alignment. during rearrangements finding alignment: Glocal inaN Pcae 2.0. Package ViennaRNA neb 2013. Ensembl iN nmnltr: ve icroaig eei origins, genetic incorporating view a nomenclature: miRNA eunig f h sa ape ( lamprey sea the of Sequencing uli Ais Res. Acids Nucleic uli Ais Res. Acids Nucleic uli Ais Res. Acids Nucleic Nucleic Acids Res. Acids Nucleic 1 (20 November 2015). November (20 loihs o. Biol. Mol. Algorithms rns Genet. Trends a. Genet. Nat. ermzn marinus Petromyzon

36

41 32

D154–D158 , 41

D700–D705 , 31 , D109–D111 , 45 uli Acids Nucleic D48–D55 , 613–626 , 415–421 ,

6 26 , )

133. 132. 131. 130. 129. 128. 127. 126. 125. 124. 123. 122. 121.

the origin of enamel. of origin the Computing Biol. Genome 25 10 genome. human the to sequences DNA short of alignment efficient transform. (2006). zebrafish. in transgenesis Tol2transposon-mediated Bioinformatics Protoc. Curr. Res. Acids Nucleic genomes. yeast and (2007). Univ. State aligner. blockset genomics. comparative (2004). W273–W279 for tools computational Qu, Q., Haitina, T., Zhu, M. & Ahlberg, P.E. New genomic and fossil data illuminate Team. Core Development R data. count sequence for analysis expression W.Huber,Differential & S. Anders, H. Li, memory- and Ultrafast S.L. Salzberg, & M. Pop, C., Trapnell, B., Langmead, Burrows-Wheeler with alignment read short accurate and Fast R. Durbin, & H. Li, analysis. feature genome S. Fisher, for tool Swiss-army the BEDTools: A.R. Quinlan, A.S. Hinrichs, A. Siepel, R.S. Harris, M. VISTA: Blanchette, I. Dubchak, & E.M. Rubin, A., Poliakov, L., Pachter, K.A., Frazer, , 2078–2079 (2009). 2078–2079 , (2009). R25 , et al. et et al. et (R Foundation for Statistical Computing, 2015). Computing, Statistical for Foundation (R t al. et Bioinformatics The Sequence Alignment/Map format and SAMtools. and format Alignment/Map Sequence The mrvd arie lgmn o Gnmc DNA Genomic of Alignment Pairwise Improved

11 Evaluating the biological relevance of putative enhancers using enhancers putative of relevance biological the Evaluating t al. et t al. et vltoaiy osre eeet i vrert, net worm, insect, vertebrate, in elements conserved Evolutionarily , R106 (2010). R106 , Res. Genome

34 Genome Res. Genome lgig utpe eoi sqecs ih h threaded the with sequences genomic multiple Aligning Nature h US Gnm Bosr aaae udt 2006. update Database: Browser Genome UCSC The , D590–D598 (2006). D590–D598 ,

25

47 , 1754–1760 (2009). 1754–1760 , 526 : Lnug ad niomn fr Statistical for Environment and Language A R:

14 , 11.12.11–11.12.34 (2014). 11.12.11–11.12.34 , , 108–111 (2015). 108–111 ,

, 708–715 (2004). 708–715 , 15 , 1034–1050 (2005). 1034–1050 , Nat. Protoc. Nat. uli Ais Res. Acids Nucleic Nature Ge Nature PD hss Penn. thesis, PhD .

1 Bioinformatics , 1297–1305 , Genome Biol. Genome n etics

32 ,

CORRIGENDA

Corrigendum: The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons Ingo Braasch, Andrew R Gehrke, Jeramiah J Smith, Kazuhiko Kawasaki, Tereza Manousaki, Jeremy Pasquier, Angel Amores, Thomas Desvignes, Peter Batzel, Julian Catchen, Aaron M Berlin, Michael S Campbell, Daniel Barrell, Kyle J Martin, John F Mulley, Vydianathan Ravi, Alison P Lee, Tetsuya Nakamura, Domitille Chalopin, Shaohua Fan, Dustin Wcisel, Cristian Cañestro, Jason Sydes, Felix E G Beaudry, Yi Sun, Jana Hertel, Michael J Beam, Mario Fasold, Mikio Ishiyama, Jeremy Johnson, Steffi Kehr, Marcia Lara, John H Letaw, Gary W Litman, Ronda T Litman, Masato Mikami, Tatsuya Ota, Nil Ratan Saha, Louise Williams, Peter F Stadler, Han Wang, John S Taylor, Quenton Fontenot, Allyse Ferrara, Stephen M J Searle, Bronwen Aken, Mark Yandell, Igor Schneider, Jeffrey A Yoder, Jean-Nicolas Volff, Axel Meyer, Chris T Amemiya, Byrappa Venkatesh, Peter W H Holland, Yann Guiguen, Julien Bobe, Neil H Shubin, Federica Di Palma, Jessica Alföldi, Kerstin Lindblad-Toh & John H Postlethwait Nat. Genet. 48, 427–437 (2016); published online 7 March 2016; corrected after print 25 April 2016

As we intended, other researchers have been able to use the draft spotted gar genome sequence available from the Broad Institute website since December 2011, the assembly LepOcu1 publicly available from NCBI since 13 January 2012 under accession code GCA000242695.1, and the Ensembl gene annotation (version 74, December 2013; http://www.ensembl.org/Lepisosteus_oculatus/Info/Annotation) and recent annotation by NCBI on 15 May 2014 guided by RNA sequence data from ten tissues. While this article was in review, a paper (Nature 526, 108–111, 2015) was published that arrives at conclusions similar to some of our own analyses, and we wish to acknowledge that publication, which used our unpub- lished data and genome annotations, emphasizing the importance of the strategy of early release of sequence data. The correction has been made to the HTML and PDF versions of the article. Nature America, Inc. All rights reserved. America, Inc. © 201 6 Nature npg

NATURE GENETICS