University of Kentucky UKnowledge

Biology Faculty Publications Biology

2-24-2013 Sequencing of the Sea Lamprey (Petromyzon marinus) Genome Provides Insights into Evolution Jeramiah J. Smith University of Kentucky, [email protected]

Shigehiro Kuraku RIKEN

Carson Holt University of Utah

Tatjana Sauka-Spengler California Institute of Technology

Ning Jiang Michigan State University

See next page for additional authors

Right click to open a feedback form in a new tab to let us know how this document benefits oy u. Follow this and additional works at: https://uknowledge.uky.edu/biology_facpub Part of the Genetics and Genomics Commons

Repository Citation Smith, Jeramiah J.; Kuraku, Shigehiro; Holt, Carson; Sauka-Spengler, Tatjana; Jiang, Ning; Campbell, Michael S.; Yandell, Mark D.; Manousaki, Tereza; Meyer, Axel; Bloom, Ona E.; Morgan, Jennifer R.; Buxbaum, Joseph D.; Sachidanandam, Ravi; Sims, Carrie; Garruss, Alexander S.; Cook, Malcolm; Krumlauf, Robb; Wiedemann, Leanne M.; Sower, Stacia A.; Decatur, Wayne A.; Hall, Jeffrey A.; Amemiya, Chris T.; Saha, Nil R.; Buckley, Katherine M.; Rast, Jonathan P.; Das, Sabyasachi; Hirano, Masayuki; McCurley, Nathanael; Guo, Peng; Rohner, Nicolas; Tabin, Clifford J.; Piccinelli, Paul; Elgar, Greg; Ruffier, Magali; Aken, Bronwen L.; Searle, Stephen M.J.; Muffato, Matthieu; Pignatelli, Miguel; Herrero, Javier; Jones, Matthew; Brown, C. Titus; Chung-Davidson, Yu-Wen; Nanlohy, Kaben G.; Libants, Scot V.; Yeh, Chu-Yin; McCauley, David W.; Langeland, James A.; Pancer, Zeev; Fritzsch, Bernd; de Jong, Pieter J.; Zhu, Baoli; Fulton, Lucinda L; Theising, Brenda; Flicek, Paul; Bronner, Marianne E.; Warren, Wesley C.; Clifton, Sandra W.; Wilson, Richard K.; and Li, Weiming, "Sequencing of the Sea Lamprey (Petromyzon marinus) Genome Provides Insights into Vertebrate Evolution" (2013). Biology Faculty Publications. 3. https://uknowledge.uky.edu/biology_facpub/3

This Article is brought to you for free and open access by the Biology at UKnowledge. It has been accepted for inclusion in Biology Faculty Publications by an authorized administrator of UKnowledge. For more information, please contact [email protected]. Authors Jeramiah J. Smith, Shigehiro Kuraku, Carson Holt, Tatjana Sauka-Spengler, Ning Jiang, Michael S. Campbell, Mark D. Yandell, Tereza Manousaki, Axel Meyer, Ona E. Bloom, Jennifer R. Morgan, Joseph D. Buxbaum, Ravi Sachidanandam, Carrie Sims, Alexander S. Garruss, Malcolm Cook, Robb Krumlauf, Leanne M. Wiedemann, Stacia A. Sower, Wayne A. Decatur, Jeffrey A. Hall, Chris T. Amemiya, Nil R. Saha, Katherine M. Buckley, Jonathan P. Rast, Sabyasachi Das, Masayuki Hirano, Nathanael McCurley, Peng Guo, Nicolas Rohner, Clifford J. Tabin, Paul Piccinelli, Greg Elgar, Magali Ruffier, Bronwen L. Aken, Stephen M.J. Searle, Matthieu Muffato, Miguel Pignatelli, Javier Herrero, Matthew Jones, C. Titus Brown, Yu-Wen Chung- Davidson, Kaben G. Nanlohy, Scot V. Libants, Chu-Yin Yeh, David W. McCauley, James A. Langeland, Zeev Pancer, Bernd Fritzsch, Pieter J. de Jong, Baoli Zhu, Lucinda L Fulton, Brenda Theising, Paul Flicek, Marianne E. Bronner, Wesley C. Warren, Sandra W. Clifton, Richard K. Wilson, and Weiming Li

Sequencing of the Sea Lamprey (Petromyzon marinus) Genome Provides Insights into Vertebrate Evolution

Notes/Citation Information Published in Nature Genetics, v. 45, no. 4, p. 415-421.

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/

Supplementary files are listed at the end of this page.

Supplementary PDF file: Supplementary Figures 1–33, Supplementary Tables 2–5, 7, 11–14 and 17–24 and Supplementary Note.

Supplementary Excel file 1: List of reads excluded from the assembly by the Arachne assembler and the reported reason for their exclusion.

Supplementary Excel file 2: A complete list of human/Callorhinchus milii and human/Fugu CNEs aligning to the lamprey genome.

Supplementary Excel file 3: Gene families that show a significant change in the lamprey lineage, within the species tree (lamprey,gnathostomes) or (Ciona,(lamprey,gnathostomes))

Supplementary Excel file 4: A lamprey/chicken comparative map.

Supplementary Excel file 5: A lamprey/human comparative map.

Supplementary Excel file 6: 224 gene families that are found in , but not in lineages.

Supplementary Excel file 7: Genes in the lamprey genome with immunity-related ontlogies.

Digital Object Identifier (DOI) http://dx.doi.org/10.1038/ng.2568

This article is available at UKnowledge: https://uknowledge.uky.edu/biology_facpub/3 © 2013 Nature America, Inc. All rights reserved. A A full list of affiliations appears at the end of the paper. extinct several ( and lineages lamprey limited), currently are data sequence genome-scale which (for hagfish (gnathostomes), vertebrates jawed gave to of rise the lineage this diversification Subsequent vertebrates. modern-day of characters defining both placodes, ectodermal crest and neural skeletogenic the as such types cell vertebrate uniquely adults in features These must that suggest embryos the have corresponding possessed already rate. heart and respiration as such tions, sensory paired from organs func for and hindbrain unconscious a controlling segmented information optic an sensory (including processing for midbrain a tectum) gland, pituitary the via neuroendocrine signaling regulating for forebrain a with brain, tripartite a housed species this of cranium cartilaginous The fins. median and arches vertebral branchial the and for structure (cranium support a provides system and column) nervous central skeleton the encases cartilaginous a that as such characteristics vertebrates, several modern with with shared species a of emergence the This included plans. body animal of there diversity the period, in elaboration Cambrian great a the was during that, shows record fossil The events of define duplications broad-scale assembly. genomes deeply Lampreys Wesley C Warren Pieter J de Jong Scot V Libants Javier Herrero Greg Elgar Masayuki Hirano T Chris Amemiya Robb Krumlauf Joseph D Buxbaum Michael S Campbell Jeramiah J Smith genome provides insights into vertebrate evolution Sequencing of the lampreysea ( OPEN Nature Ge Nature Received 20 July 2012; accepted 31 January 2013; published online 24 February 2013;

appendages.

key that shared

Fig. Fig. and

We are n evolutionary

have etics sequence

25 likely

1 the

representatives note ancestry, , , Magali Ruffier and and

The

27 shaped 30

underlying 35 15

, , Matthew Jones occurred , , Chu-Yin Yeh challenges ADVANCE ONLINE PUBLICATION ONLINE ADVANCE

Supplementary Note Supplementary 22 1 lamprey , , Baoli Zhu 36 , 2 , 16 2 ,

, , , Sandra W Clifton 23 19 information , , Shigehiro Kuraku 11–14 5 , Leanne , M Leanne Wiedemann

, Mark D Yandell

the , Nathanael McCurley events , , Nil R Saha the

, , Ravi Sachidanandam

sea

genomes

genome before principles

of faced

within lamprey

an 35 26

, from , , Bronwen L Aken

37 30

ancient

the owing

6 provides

, , Lucinda L Fulton , , David W McCauley

of 2 , , C Titus Brown vertebrate , , Katherine M Buckley

of divergence

). closely extant ( P. marinus

vertebrate

5

to vertebrate 36 , Tereza Manousaki 3

,

4 its , an 37

, , Carson Holt

organisms. related

22

, , Richard K Wilson high

lineages, important ,

15 23 of )

genome , , Peng Guo

biology.

17 11 content ancestral

lineage

species. , , Stacia A Sower 26 , Carrie , SimsCarrie 28 , , Stephen M J Searle

, 36 including 29

resource , , Brenda Theising

, , Yu-Wen Chung-Davidson

31

is Here,

of that 5

-

, , , James A Langeland lamprey uniquely Analyses 37

repetitive 20 22 4 , , Tatjana Sauka-Spengler

, Axel Meyer DNA derived from the liver of a single wild-captured adult female sea Approximately 19 million sequence reads were generated from genomic Sequencing, RESULTS an adaptive immune system and paired appendages or limbs. of modern vertebrate features, including jaws, myelinated nerve innovationssheaths, that may have contributed to the evolutiontion of andancient development genome duplication events and the elucidation of genome.geneticQuestionsremainsubsequent andtimingabout the elabora insightsinto the structure and gene content of the ancestral vertebrate prey genome to gnathostome genomes holds the promise of providing preyas an outgroup to the gnathostomes ( largely unresolved and the history of gene and genome duplication events,contingentcontentunderstandinggenomegenestructure,isan of on areas that remain tebrate evolution, although the interpretation of many of these findings and hagfish have advanced the reconstruction of several aspects of ver , diverged ,

21 23 we

the

Recent advances in developmental genetics methods for the lamprey for , , Jonathan P Rast , Nicolas Rohner 15 36

doi:10.1038/ng.256 Petromyzon marinus present

origin and

, , Alexander S Garruss & Weiming Li reconstructing poised of 18

elements

the , , Wayne A Decatur

from gnathostome

of

the

assembly assembly to 26

4 36

myelin-associated our , Ona E Bloom

, , Matthieu Muffato

provide , , Paul Flicek first 1

. Given the critical phylogenetic position of the lam and

own 32

8

lamprey vertebrate

24

GC , Zeev Pancer , Zeev 20

indicate 30

and

lineages. , Clifford J Tabin

~500 insight , 21

30 bases,

, , Sabyasachi Das annotation , , Kaben G Nanlohy 6 ,

37 million whole-genome 27

18 15 that into , , Ning Jiang

8

origins

, , Marianne E Bronner as

, Moreover, , , Jeffrey A Hall , , Malcolm Cook 9 proteins , Jennifer R Morgan

well

two the

years 33 27 Fig.

and

, Bernd , FritzschBernd ancestry , , Miguel Pignatelli as whole-genome

and

24

the 1

ago.

the ),comparing the lam the , Paul Piccinelli

7 sequence

, 22 the absence s e l c i t r A

results evolutionary By ,

23 of 30

18 development )

,

virtue vertebrate 15 ,

,

,

help

of and 10

6 of 34 , ,

, this

27 25

 - - - - , ,

© 2013 Nature America, Inc. All rights reserved. Red, lamprey; gray, ; green, green, invertebrates; gray, lamprey; Red, ( species. individual for concatenated genes predicted all of sequences nucleotide the using performed was values (RSCU) usage codon synonymous relative on (CA) analysis Correspondence ( genomes. invertebrate and vertebrate other in observed patterns from properties sequence coding lamprey 2 Figure genome. informative therefore evolutionarily this of assembly structure and This content gene the of resolution genome. unparalleled provides lamprey the of organization chromosomal the reflect indi accurately genomes scaffolds lamprey the that cates gnathostome with synteny conserved extensive of ( features intergenic conserved Note single-copy of ( majority regions a genomic over structure megabase-scale multikilo to resolved base- assembly This Mb. 2.4 was contig and longest longer, the or kb 174 of contigs contigs. 1,219 in was 25,073 assembly across the of Half distributed sequence of Gb 0.816 of sisted in assembly algorithms ( broad-scale structure enabled the optimization of the parameters used of analyses sequences, contiguous long of assembly the encumber to 1 Figs. repetitive, rich in GC bases and highly heterozygous ( highly is genome lamprey the that indicated analyses These genome. large-scale sequence content and the repetitive structure of the lamprey genome not fully understood. We used raw sequence reads to examine sues which result in the deletion of ~20% of germline DNA from somatic tis goes programmed genome rearrangements during early embryogenesis, project was initiated well before the discovery that the lamprey under ( lamprey synonymous with CZ, euteleostome. Cenozoic; MYA, million years ago. ray-finned fish is synonymous with and actinopterygian, osteichthyan is been omitted for simplicity. Here, reptile is synonymous with sauropsid, some extant lineages (for example, coelacanths, lungfish and hagfish) have major radiation events within the vertebrate lineage. Extinct lineages and Figure 1 s e l c i t r A  vertebrates. jawed

The current assembly was generated using Arachne using generated was assembly current The 2 550 MY 250 MY , 65 MY ), permitting the annotation of repetitive elements, genes and and genes elements, repetitive of annotation the permitting ), 3 , with the effects of rearrangement on the genic component of the –

3 An An abridged phylogeny of the vertebrates. Shown is the timing of Genome-wide deviation of of deviation Genome-wide b and and P.marinus A A A ) Amino-acid composition. composition. Amino-acid )

Precambrian Paleozoic Mesozoic CZ Supplementary Note Supplementary Outgroups Supplementary Tables 1,2 Tables Supplementary

) ( ) Lamprey a Supplementary Note Supplementary ) Codon usage bias. bias. usage Codon ) Supplementary NoteSupplementary

Cartilaginous Supplementary Note Supplementary

). Although these features tend tend features these Although ).

Ancestral vertebrate fish

Ray-finned

fish Ancestral gnathostom

). The lamprey genome genome lamprey The ). Amphibians ). and and Ancestral osteichthyan

a Reptiles

–0. –0. CA axis 2 0. 0. 0. Supplementary Supplementary Supplementary Supplementary –0.8 2 1 1 2 3 0 Mammals S. mansoni ). Detection Detection ). 4 and con and –0.6 e C. intestinalis –0. X. tropicalis Sea urchin 4 - - - - -

CA axis –0. Opossum C. savignyi previous analyses previous ref. 6,670; in lamprey, of CNEs of (5.0% 337 homologous number a limited fied sequences published to homology by identified genomes reported so vertebrate far. other Conserved noncoding the elements in (CNEs) were genes protein-coding predicted of numbers to is a number the This of similar genes. contained genes total 26,046 and invertebrate genomes ( and vertebrate of sequenced other from all those substantially differs that sequences coding to lamprey structure an underlying imparting proteins, lamprey of composition amino-acid the and usage codon codons of ( position (75%) third the in highest was content this expected, As regions. repetitive and noncoding of that than higher markedly vari less but was (61%) regions Moreover,able. structures, GC of content the protein-coding isochore possess that species amniote of those to similar content, GC in heterogeneity intragenomic of terns and ( reads sequencing whole-genome raw of tent con GC the to similar bases, GC of composed was assembly the of 46% Overall, reported. been have other that sequences most genome of vertebrate that than higher was assembly genome lamprey the of content GC the that showed genome lamprey the of Analysis ses. ence intragenomic functionality and intergenomic comparative analy possibilities. these between distinguish ultimately should genomes hagfish and lamprey additional on work Future genomes. vertebrate jawed in than genome lamprey the in rapidly more evolved much have CNEs these or constrained highly became sequences CNE gnathostome most before vertebrates jawed from diverged age 6 Table the half approximately of (53%) CNE length ( the gnathostome homologous over extended typically identity sequence served homologous regions in the lamprey and gnathostome genomes, tive elements using the MAKER pipeline MAKER the using elements tive repeti and signals splicing sequences, coding of prediction the and ing (RNA-seq) mapping and exon data with gene linkage homologies sequences. repetitive poten of a transposition and evolution the represent of studies for resource repeats rich tially young) (presumably abundance the high-identity and of elements repetitive lamprey of diversity large genome during assembly.to elements of the collapsing repetitive The owing underestimate, significant a be to expected is proportion this ( assembly Fig. the of 34.7% for accounting elements, repetitive of families distinct 7,752 We identity. identified sequence high with elements repetitive abundant contained genome lamprey 2 N. vectensis Variation in nucleotide content and substitution can strongly influ The The location of genes was determined by combining RNA sequenc initio Ab Chicken 1 Amphioxus

Supplementary Note Supplementary Supplementary Note Supplementary Zebrafish 4 , 0 Supplementary Tables 3 Fruitfly Zebra finch Supplementary Fig. 6 Fig. Supplementary and and Dog F. rubripes Pi 6 Mouse g Human 0 searches for repetitive DNA sequences showed that the the that showed sequences DNA repetitive for searches Platypus Stickleback ) and 287 (6.0% of 4,782; ref. ref. 4,782; of (6.0% 287 and ) .2 Lamprey Supplementary Note Supplementary T. nigroviridis a 0 DVANCE ONLINE PUBLICATION ONLINE DVANCE 8 .4 . For those lamprey CNEs that were linked to con that were linked CNEs . lamprey For those Fig. Fig. b

CA axis 2 ). The set final of annotated protein-coding ). Genome-wide analyses also showed pat showed also analyses ). Genome-wide –0.12 –0.08 –0.04 0.04 0.08 2 ). ). Notably, we did not detect a significant –0.2 0 ). Patterns of GC bias strongly affect affect strongly bias GC of Patterns ). , 4 Lamprey and ). Thus, either the lamprey line lamprey the either Thus, ). T. nigroviridi Supplementary Note –0. Human Sticklebac 1 Mous 5 5 Zebra finc Chicke Dog ), in close agreement with with agreement close in ), Pi ( g Supplementary Table5 Supplementary e s CA axis Platypus Supplementary Fig. Supplementary k n F. rubripes

0 h 6 , Nature Ge Nature 7 Opossu X. tropicalis Fruitfl Zebrafis 1 Amphioxus . Searches identi Searches . Supplementary Supplementary Supplementary Supplementary y Sea urchin N. vectensis m h 0. 1 C. intestinalis ). Notably, S. mansoni C. savignyi n etics 0 .2

5 ------

© 2013 Nature America, Inc. All rights reserved. genomes genomes ( the between lamprey were and observed both the human and chicken ( ment or an score alignment (bit score) 90% within of align the top-scoring the between alignment two the they yielded genomes highest-scoring Note Supplementary ( struction recon a of phylogenetic in aspects many confounding genomes, bypasses way that both or one on retained been have paralogs that ity 000002315. (GRCh37, MAKER the from genes protein-coding prey assembly. genome lamprey entire the to patterns and genomes the lamprey these compared and gnathostome analyzed patterns of duplication syntenic regions within conserved of and genome duplication events in we vertebrate lineage, the ancestral gene To identify events. duplication timing whole-genome of outcome and occurrence, the regarding questions several addressing for outgroup to jawed vertebrates, the lamprey genome is uniquely suited far thus data sequence genome-wide by sup ported well been not has events duplication defining these of timing evolution vertebrate of history the in early occurred It that is accepted two generally rounds of duplication whole-genome Duplication genomes. vertebrate in content GC of heterogeneity help will the to of causes and identify the consequences intragenomic GC content in the lamprey sea and other lamprey species and hagfish coding of dissection further vertebrates, among outlier an clearly is bias rather than for selection GC-rich codons. specific As the lamprey nario in which high GC content results substitution from broad-scale Note Supplementary ( expression codon with usage bias correlated and amino-acid composition but not strongly with the levels of gene content GC genomic that showed bias, usage codon and amino-acid composition and regions the levels of . The results protein-coding of content GC the between relationship the examined we heterogeneity, intragenomic genomes. among and within content GC of variation observed role of or the value the adaptive biological a regarding other question tally from those in species that possess isochore structures. This raises in heterogeneity lamprey intragenomic GC content fundamen differ 7 Fig. ( regions noncoding adjacent of content GC the and codons of position third the of content GC the between correlation duplicates. indicate Asterisks scaffolds. lamprey syntenic on copies ( scaffolds. lamprey syntenic ( chromosomes. ( copies. single as present are 54 and genome, chicken the in copies ( copies. single as present are 59 and genome, chicken the in copies duplicate as present ( scaffolds. lamprey of series a to correspond ( lines. colored by connected are and scaffolds and chromosomes on positions physical their to relative plotted are duplicates) (including orthologs lamprey-chicken ( genomes. (chicken) gnathostome 3 Figure Nature Ge Nature and c , d We estimated duplication frequencies by aligning all predicted lam its and content GC high of basis biological the explore Tofurther ) Pairs of lamprey scaffolds that correspond to individual chicken chicken individual to correspond that scaffolds lamprey of Pairs ) Supplementary Note Supplementary ). Thus, it seems that the processes that lead to the patterns of patterns the to lead that processes the that seems it Thus, ). Supplementary Note Supplementary

Conserved synteny and duplication in the lamprey and and lamprey the in duplication and synteny Conserved Supplementary Figs. 18 Figs. Supplementary GCA_000001405. Supplementary Figs. 12 Figs. Supplementary n 1 Supplementary Figs. 8 Figs. Supplementary ) whole-genome assemblies. To assemblies. ) whole-genome account for the possibil etics

structure c ) Three chicken loci are present as duplicate copies on on copies duplicate as present are loci chicken Three ) b ) Twelve lamprey loci are present as duplicate duplicate as present are loci lamprey Twelve )

ADVANCE ONLINE PUBLICATION ONLINE ADVANCE ), regions were considered putative orthologs if putative orthologs ), were regions considered ). These observations are consistent with a with sce are consistent observations ). These

of ). For simplicity, we present comparisons comparisons present we simplicity, For ).

d a the ) Two chicken loci are present as duplicate duplicate as present are loci Two) chicken , 1 ). Strong patterns of conserved synteny synteny conserved of patterns Strong ). b ) and chicken (Gallus_gallus-2.1, ) Pairs of chicken chromosomes that that chromosomes chicken of Pairs )

genome a – d – – – ) The locations of presumptive presumptive of locations The ) 21 17 11 , , , , , , Supplementary Tables Supplementary 9 Supplementary Table 8 Supplementary Supplementary Table Supplementary 7 a ) Ten lamprey loci are are loci Ten) lamprey 5 data set to the human human the to set data 1 0 . As the proximate proximate the As . Supplementary Supplementary

9 . However, the the However, .

G

CA_ and and – 13 ------

has had a major role in shaping lamprey genome architecture. architecture. genome lamprey shaping in role major duplication a (whole-genome) had has large-scale that notion the support architecture. genome gnathostome shaping in role major a had has duplication (whole-genome) as these that strong large-scale evidence 3 Fig. (14.5%, homologs syntenic conserved the of fraction modest tively rela a constituted duplicates these Although chromosomes. ostome on gnath present two the being homologous genomes, gnathostome syntenic blocks were conserved mapped to duplicate positions within some of the homologous individual markers that contributed to these in interdigitated syntenic blocks on several lamprey scaffolds. Notably, vidual pairs of gnathostome chromosomes were recurrently observed ( orthologs gnathostome of synteny conserved interdigitated of terns pat showed scaffolds lamprey all Nearly copy. paralogous either of loss random by followed duplication large-scale with consistent are ( genome gnathostome a of regions distinct two from homologies interdigitated contained scaffold lamprey gle sin a which in cases for searched we informed Specifically, be presence. their by can but genes duplicated of retention the on rely not lamprey the does that duplication used large-scale of signature a for we search to genome Accordingly, duplication duplication. after whole-genome paralog one of rounds two ( the genomes respective from their in resulting copies two possess not do currently have than genomes mammalian rearrangements interchromosomal fewer substantially undergone have to known is genome this as genome, chicken the to Supplementary Tables Supplementary Supplementary Note Supplementary b a Similar duplication patterns on lamprey scaffolds also seem to to seem also scaffolds lamprey on patterns duplication Similar genes gnathostome and lamprey most that indicate analyses Our GG3 GG2 GG27 a ; 18.2%, ; 18.2%, scaffolds Lamprey scaffolds Lamprey Fig. 3 Fig. b 11 ; not counting redundant copies), we interpret we interpret copies), ; redundant not counting ), presumably owing to the frequent loss of of loss frequent the to owing presumably ), , 1

2 9 9 . and GG5 GG7 10 ). Moreover, homologs from indi from homologs Moreover, ). PM2226 d c PM468 PM229 Fig. Fig. GG20 GG s e l c i t r A 3 * * * * * ). Such patterns patterns Such ). 7 PM90 PM9  - - - - -

© 2013 Nature America, Inc. All rights reserved. < 1 × 10 × 1 < 0.150, = (observed 0.022; = expected chance by expected be would than higher was Note Supplementary 0.262; = chicken 0.271, = (lamprey lamprey in duplicates retained with number the from different significantly not was genomes gnathostome in duplicates retained with loci ancestral of and gnathostomes. lampreys lineage ancestral common the in occurred likely event duplication whole-genome (two-round) recent most that the indicating genomes, in gnathostome lar to those synteny simi lamprey in the genome that of are interdigitated highly are patterns These of markers). consistent with the existence patterns syntenic no with 11.1% the (including duplicate lamprey individual contained syntenic scaffolds genes 2.98 conserved additional for each (11.1%) contained an additional syntenic ortholog). On average, these and no scaffold (29.6%) one scaffold (59.3%), scaffolds both cifically, chicken chromosome that harbored an ortholog of the duplicate (spe contained at duplicates, scaffolds on least one ortholog the additional and one trio that was present on three large scaffolds. For a majority of fied 29 that of the genome we at identi Among scaffolds, large these (0.448). to similar was that duplicates) of copies redundant including (0.463, a frequency duplication and genes) possessed homology-informative These 83 scaffolds accounted for 10% of the comparative map (10% of lamprey ten that scaffolds or possessed more gnathostome homologs. all examined manually we genome, lamprey the within rounds) two example, (for events duplication whole-genome of ancient indicative 3c Fig. conserved synteny (two defining signatures of large-scale duplication; interdigitated of patterns and paralogs predicted contained scaffolds lamprey large two which in identified were cases several resolution, chromosome-scale provide yet not do scaffolds lamprey Although s e l c i t r A  selection against small-scale duplicates across a majority of the genome Scf. 821.1-95111

Scf. 176.1-302093 b a Pm2Hox Unassigned Additional genome-wide analyses showed that (i) the number of number the (i) that showed analyses genome-wide Additional Pm1Hox HOX HOX HOX HOX Lamprey Human Lamprey Chicken Medaka Human Chr. Chr. Chr. 10 Chr. 15 Ancestor D C B A

, gene pairs that were present as duplicates on two large scaffolds 6 4 d −100 TAX1BP1 and and ; ; Supplementary Note Supplementary CALCOCO Supplementary Note Supplementary EV HRH2 χ X 13 2 NANOS1 12 = 6179, 6179, = 11 ); (ii) the frequency of shared duplications duplications shared of frequency the (ii) ); EIF3A Chr. Paralogous Hoxgroups 10 2 8 9 mir-196 FAM45 P ( χ 6 7 2 ) < 1 × 10 × 1 < ) MMP21 ); (iii) a model invoking recurrent recurrent invoking model a (iii) ); 4 5 ). To further assay for patterns patterns for assay further To ). (NEFL) IN mir-10 A 2 3 DHX3 −100 1 2 Chr. 20 PTPR PTPR , , SKA P I χ SN (Fisher’s test) exact E A 2 P = 2.94, 2.94, = X 2-lik X X 3 2 2 GnRH III II MT e CB C X MRPS26 X BOLL YC P MGMT = 0.08; 0.08; = FAM126 MRPL EBF - - -

targeted analyses of Hox clusters and gonadotropin-releasing hormone duplication ( whole-genome of rounds two with consistent families gene resolved analyses phylogenetic in lamprey the of inclusion (iv) ( cation dupli shared of patterns genome-wide explain to sufficient not was groups. homology lamprey- of gnathostome members family representative indicate Symbols (THR)). receptor hormone thyroid and (HNRNP) ribonucleoprotein nuclear heterogeneous (RAR), receptor acid (retinoic limitations space to owing omitted are cluster, PM2Hox the to adjacent located syntenic genes, conserved additional Three assembly). genome human the into integrated are regions syntenic Hox human four all because chicken used are than rather clusters Hox (human clusters Hox human to relative bottom. ( the at shown is region gene the of state ancestral The presumptive loci. GnRH lost indicate X’s red with rectangles of Open gene. each orientation the represents box pointed A arrows. line with is indicated (scf.) scaffold and (chr.) chromosome each of orientation The . in absent is which GnRH3, for region medaka the including humans, and chicken lamprey, in genes GnRH4 proposed) (previously ( regions. syntenic conserved lamprey-gnathostome of evolution the on loss 4 Figure pod-specific gene) pod-specific restricted (for example, was thought to previously distribution taxonomic lies be whose more and ( lineage vertebrate ancestral the to origin evolutionary their trace that presumably 224 gene families identified search this In total, squirt). sea and lancelet worm, acorn limpet, sea urchin, sea deuterostomes: but to not invertebrate limited (including resources project–based genome and databases sequence annotated in homolog invertebrate identifiable no had (ii) and genome ostome lamprey genes that (i) had in homologs at gnath one least sequenced for searched we end, Toward this trajectory. evolutionary successful specific to innovations in ancestral vertebrates that contributed contributed to their arguably have might genes new these how lineage infer vertebrate and ancestral the within evolved that genes identify to us enable might genome lamprey the that reasoned we However, networks gene and regions regulatory preexisting of the modification through evolved vertebrates physi characterize that and features morphological ological the of many that suggested been has It Ancestral divergence. duplication before lamprey-gnathostome wide duplication are indicative of a shared history of two rounds of genome- duplication event. We therefore propose that genome-wide patterns of by whole-genome shared ancient arise an than other would mechanisms or chance blocks synteny of distributions and arrangements genomic such that unlikely exceedingly Itseems genomes. thostome the subtle explain differences in to the duplicationlikely structuresis of duplicates the lamprey andindependent gna of subset a retain to a neither out, ruled completely be cannot lineages lamprey and tome gnathos in events duplication whole-genome ancient and pendent inde two or one involving scenario parsimonious less the Although Figs. after the last whole-genome duplication event ( shortly lineages two the of divergence the with consistent genomes, in lamprey the and gnathostome largely independently tion occurred (GnRH) syntenic regions showed that the loss of paralogs after duplica b Supplementary Figs. 12 Figs. Supplementary

gnathostome ) Assembled lamprey Hox scaffolds and patterns of conserved synteny synteny conserved of patterns and scaffolds Hox lamprey Assembled ) Supplementary Note Supplementary

22 Supplementary Figs. 18 Figs. Supplementary

– a The effect of genome duplication and independent paralog paralog independent and duplication genome of effect The 24 ) Conserved synteny among the GnRH2, GnRH3 and and GnRH3 GnRH2, the among synteny Conserved )

vertebrate , , Supplementary Table 14 Table Supplementary - specific genome duplication nor persistent selection selection persistent nor duplication genome specific a DVANCE ONLINE PUBLICATION ONLINE DVANCE 1 4 . Thus, roughly 1.2–1.5% of the protein-coding protein-coding the of 1.2–1.5% roughly Thus, .

APOBEC4 biology – ). ). Notably, many included gene fami these 17 and and – was previously reported to be a tetra 21 Supplementary Note Supplementary and and and and Supplementary Note Supplementary Supplementary Table Supplementary Supplementary Note Supplementary Fig.

Nature Ge Nature 4 , , Supplementary Supplementary ). Moreover, ). n etics ); and and );

15 1 3 ). ). ------.

© 2013 Nature America, Inc. All rights reserved. of of and protein), proteolipid Mpz include genes are thought to completely lack myelinating oligodendrocytes Note Supplementary vertebrates jawed of systems ( nervous peripheral and in central formation myelin the with associated genes of the enrichment identified specific genome lamprey the of analysis Notably, disorders. movement to cognitive from range that manifestations many have myelination of disorders humans, In conduction. neuronal of speed and efficiency the increasing lipids, and proteins of layer a in axons animals. throughout common are and origin ancient of that view most genes involved in the of are regulation morphogenesis held broadly the with consistent also were analyses Ontology genes. vertebrate new of advent the by facilitated nervous been have central might system vertebrate the in signaling of elaboration the that and peptide ( neurohormone signaling related to and in myelination neuro functions enriched significantly lamprey ont of distribution genome-wide the to ontologies gene these ( lies fami gene vertebrate-specific 224 the for information (functional) ontology gene collected we ancestor, vertebrate the of evolution the Tables 16 Supplementary of lamprey and gnathostome immunity ( evolution the in parallels broad reflect which lineages, gnathostome versus lamprey the in inflammation and function neural to related lamprey the in genes lineage and the differential contraction and clotting-related expansion of of gene families loss specific the included ( ages line showed vertebrate within also families gene of analyses reductions and Phylogenetic expansions evolution. vertebrate the at of emerged base that genes new from originated genes) ~20,000 out of families 224 from genes (263 genome human the in landscape signaling. neurohormone and development neural to related P with over-represented are that ontologies all for shown are Data models. gene lamprey of set entire the in and families gene vertebrate-specific among classes ontology of frequencies the show bars Horizontal families. 5 Figure Nature Ge Nature invertebrate an Fig. Neuropeptide hormone Neuropeptide signaling < 0.005 (Fisher’s exact test). Most over-represented ontologies are are ontologies over-represented Most test). exact (Fisher’s < 0.005 Negative regulationof In all extant gnathostomes, myelinating oligodendrocytes wrap wrap oligodendrocytes myelinating gnathostomes, extant all In to contributed have might genes new how understand better To Mal o Internode regionof Chemokine activity Myt1l (encoding myelin protein zero), as as well zero), protein myelin (encoding logies showed that these vertebrate-specific gene families were were families gene vertebrate-specific these that showed logies Supplementary Fig. 31 Fig. Supplementary

5 Synaptic vesicl Compact myeli upeetr Tbe 8 Table Supplementary and and , ,

Supplementary Fig. Supplementary 32 Adult feeding Enrichment of gene ontologies among vertebrate-specific gene gene vertebrate-specific among ontologies gene of Enrichment (encoding myelin 1-like). Homologs Homologs 1-like). factor transcription myelin (encoding n targeting behavior pathway Pmp22 appetite activity etics axon Pmp22 n e

were reported to be present in in present be to reported were 0 0 ADVANCE ONLINE PUBLICATION ONLINE ADVANCE Mal ), ), despite the that fact extant vertebrates jawless (encoding peripheral myelin protein 22) and and 22) protein myelin peripheral (encoding .0 (encoding myelin and lymphocyte protein) myelin and lymphocyte (encoding 1 1 6 – , and putative putative and , 22 0.02 and and and , , and and Supplementary TablesSupplementary 15 Supplementary Note Supplementary 0.03 Supplementary Note Supplementary Proportion ofgenes upeetr Note Supplementary Fig. Fig. 0.04 Supplementary Figs. Supplementary 25 Ciona 5 0.0 ). These findings ). suggest These findings 5 All lampreygenes familie Vertebrate-specific Plp homologs of of homologs 0.06 (encoding myelin (encoding Ciona intestinalis Ciona s

0.07 ). Comparing ). Comparing ). , 23 0.08 1 ). These These ). 5 , . . These 24 Myt1l – 0.09 and 30

- - ­ , ,

genes that impart unique functionality to gnathostome T and B to lym gnathostome functionality unique impart that genes vertebrate ancestral the of receptor the reflecting instead perhaps immunoglobulins, ostome gnath to unrelated are that receptors immune adaptive possess but gnathostomes of lymphocytes B and T the to similar are that types cell immune major two possess Lamprey system. immune vertebrate the of evolution the understanding for model comparative key a as myelin. gnathostome of origin the on light shed to continue should hagfish and lamprey in genes related myelination- of function the Dissecting proteins. genes myelin encoding retained it although lineage, lamprey were the but in lost ancestor secondarily vertebrate the in present been have might cells evolution of systems regulatory the through perhaps lineage, myelinating gnathostome the of in evolution oligodendrocytes the in recruited later were and vertebrate ancestor the in existed already myelin of components molecular Tables 2 ( age line vertebrate ancestral the within specifically evolved have might that genes myelination-related three identified genome lamprey the and individual species are listed on the right. ND, not determined. not ND, right. the on listed are species individual potential harboring lamprey MFCS1) as the containing intron the the in region intronic an of Comparison lamprey. 6 Figure the receptors, VLRs) extends lymphocyte for several hundred contiguous variable kilobases. For example, (encoding locus receptor immune lamprey rearranging each that showed BAC clones end-mapped and and receptors ( immune receptorsinnate might havevertebrate coincided in with the evolution complexity of adaptive reduced immune the that showed system immune the of components other of Annotation phocytes. Clawed frog ′ Anole lizard ,3 Little skate By virtue of its basal phylogenetic position, By the of lamprey virtue position, serves its also phylogenetic basal Lamprey Lamprey VLRB ′ Chicken Mouse Medaka Supplementary NoteSupplementary Spotted -cyclic nucleotide 3-phosphodiesterase); 3-phosphodiesterase); nucleotide -cyclic Plp1 Mbp ratfish

15

Absence of sequence conservation for a limb a limb for conservation sequence of Absence are identifiable in Ensembl in identifiable are (encoding myelin basic protein), protein), basic myelin (encoding Exon locus extends for at least 717 kb, with components of the the of components with kb, 717 least at for extends locus 22 8 4 0 , Supplementary Figs. 25 23 , 2 4 and and 5 . Note that two genomic regions were identified in the the in identified were regions genomic two that . Note upeetr Note Supplementary ShARE Base positioninthemouseintron(kb) Shh cis Shh 19 ). ). Analysis of the lamprey genome assembly Lmbr1 , 2 12 0 . The lamprey genome harbors several several harbors genome lamprey The . Lmbr1 -regulatory element (ShARE, also known known also (ShARE, element -regulatory 1 8 orthologs. The lengths of this intron for for intron this of lengths The orthologs. . . Alternatively, oligodendrocyte-like 16 gene – 30 , 1 20 Supplementary Tables 16 7 . Unexpectedly, analysis of of analysis Unexpectedly, . ). This suggests that the the that suggests This ). Mpz Lmbr1 24 Scaffold_164 Scaffold_408 and and EF100656 EF100665 s e l c i t r A Shh 28 gene, focusing on focusing gene, Supplementary Supplementary Exon CNP enhancer in enhancer 50% 75% 100% 50% 75% 100% 50% 75% 100% 50% 75% 100% 50% 75% 100% 50% 75% 100% 50% 75% 100% 50% 75% 100% 6 (encoding (encoding Intron length 30.1 21.3 10.1 20.0 ND ND (kb) 2.1 0.3 6.9 – 22  - - -

© 2013 Nature America, Inc. All rights reserved. gous positions in tetrapods, and chondrichthyans and teleosts tetrapods, in positions gous is found element (ShARE) in homolo regulatory appendage-specific of axis of appendages. It has been shown that the limb-specific expression and limb development, in the limbless ancestral vertebrate ( present already were pathways these whether of question the raising development for fin were paired reused fins of median positioning onic pathways the origins, signaling involved in the development and embry different Despite fins. paired but lacks fins caudal and dorsal well-developed has lamprey The behavior. and locomotion of forms additional permitted they as vertebrates, gnathostome of innovation fins in fish, hind- and forelimbs in tetrapods) are a major evolutionary pectoral and (pelvic appendages Paired split. lamprey-gnathostome the after lineage, gnathostome the of evolution the in early occurred ( scaffold current the of length entire the across practically drawn from receptor face being regions distributed s e l c i t r A  no similarity to ShAREs ( Lmbr1 of the prey orthologs on lam the our analysis we focused vertebrates, other in element the secondarily element this lost have to seem caecilians and snakes as lineage, the within least at appendages, paired of presence the with correlated is from the transcription start site of the far,so analyzed is element found in intron this 5 species of vertebrate for Hox clusters have been deposited under GenBank accessions accessions GenBank under JQ70631 deposited been have clusters Hox accession for GenBank under ited codes. Accession the pape the of in version available are references associated any and Methods M Improvi Rebuilder, www.repea URLs. lamprey in analysis functional direct for capacity the and tebrate biology, with especially continued refinements in the assembly holds the promise of providing insights into many other aspects of ver and ment evolved within the gnathostome lineage. This genomic resource evolution develop limb in receptor element regulatory key a immune that evidence provide in (v) parallels uncover new of (iv) advent the genes, to features signaling lineage, neural ancestral vertebrate this link within (iii) evolved that genes gnathostomes, new and identify lampreys (ii) of lineage ancestral common events the duplication in whole-genome two for evidence genome-wide provide (i) we examples, As biology. vertebrate ancestral of aspects and genomes vertebrate of exam evolution the few dissecting a in use its of present ples we Here, lineage. vertebrate and the origin of the evolution into insight unique provides genome lamprey The DISCUSSION lineage. gnathostome the within evolved region regulatory this that suggesting ShARE, to similar regions any detect not did also reads sequence raw and assembly genome entire the of

ethods The lamprey genome also sheds light on the evolutionary events that Shh Lmbr1 orthologs showed that these introns were much shorter and had is coordinated by a long-range long-range a by coordinated is CodonW, ng_Assemblie 4 – gene (encoding limb region 1) that lies up to 1 Mb away Mb 1 to up lies that 1) region limb (encoding gene J tmasker.org Q70632 http://www.broad The lamprey genome assembly has been depos been has assembly genome lamprey The http:// r 7 . 2 . Transcript sequencing data have been 5 Lmbr1 . Because of the conserved genomic position of position genomic of conserved the . Because s . / Shh ; Repbase, Repbase, ; Fig. codonw.sourceforge.n is to required the anteroposterior pattern gene. Directed analysis of intron analysis 5 in gene. Directed the 6 and institute.org/crd/wi Shh Supplementary Note Supplementary Fig. 33 AEFG0 http:// . Notably, the presence of ShARE cis Supplementary Fig. 25 Fig. Supplementary -acting enhancer. This This enhancer. -acting www.girinst.org/repb 1 . Improved assemblies assemblies Improved . et / ; RECON, RECON, ; ki/index.php/ ). During fin 26 22– ). Searches

deposited , 2 2 7 4 . http:// online . In all all In . as ). Shh 2 e 1 ------, ;

of computational resources toward gene annotation. We the all recognize for High-Performance Computing at the University of Utah for the allocation K. Delventhal for BAC Weand screening sequencing. the Center acknowledge A. K. PereraStaehling, Core, SIMR particularly Molecular Biology and Stowers Institute for (SIMR) Medical support Research and from technical the (WoodsLaboratory Hole, Massachusetts). We the support acknowledge of the end mapping. A of portion research this was conducted at the Marine Biological analysis of immune system–related genes and conversion of for GFF files BAC computationalrespectively. facilities, We M. thank Robinson for bioinformatic (University of Washington) for FISH and performing providing toaccess lamprey samples forWe sequencing. F.thank Antonacci and E.E. Eichler Survey,and the US HuronGeological Lake Station Biological for providing work, the Michigan State University Core Genomic for transcriptome sequencing group Production Sequencing for sampleall procurement and genome sequencing We the thank Genome Institute, Washington University of School Medicine, Note: Supplementary information is available in the TableSupplementary 5 SRX110034. SRX110029. SRX110024. SRX109767. accessions Archive Read SRX109762. Short GenBank under prepared the corresponding T.S.-S.,sections. M.J., J.A.L. and D.W.M. developed the the study of whole-genome duplications at the stem of the vertebrate lineage and prepared by M.M., M.R. M.P. and J.H. performed GeneTree and CAFE analysis for B.L.A.M.R., sections. and S.M.J.S. developed the Ensembl gene set, led and lamprey genome, and G.E. analyzed BLAST output and wrote the corresponding of appendages. P.P. performed BLAST analyses of the noncoding portion of the andN.R. C.J.T. performed analyses related to the evolution and development of immune system genes, led and prepared by C.T.A., K.M.B., J.P.R. and M.H. K.M.B., N.R.S., J.P.R., S.D. and M.H. performed analyses related to the evolution evolution of genes, neuroendocrine led by S.A.S. and prepared by W.A.D. C.T.A., prepared by L.M.W. S.A.S., W.A.D. and J.A.H. performed analyses related to the data analysis related to the identification and annotation of Hox genes, led and nervous system. C.S., L.M.W., A.S.G., M.C. and performed experiments R.K. and transcriptomes and data, and sequence analysis related to the vertebrate central O.E.B., J.R.M., J.D.B. and performed experiments R.S. generating neuronal and M.H. contributed to analysis of codon usage bias and amino-acid composition. amino-acid composition, and contributed to the writing of the manuscript. S.D. performed analysis of vertebrate-specific gene families, codon usage bias and the consortium gene annotations and the annotation pipeline. T.M. and A.M. of transposable elements. M.D.Y. and M.S.C. contributed to the development of library construction. N.J. performed computational identification and analysis individual, sequenced and prepared genomic DNA for sequencing and BAC T.S.-S. developed the protocol for the preparation of BACs, the identified sets and developed the consortium gene annotations and annotation pipeline. preparation of the manuscript and supplements. C.H. compiled molecular data vertebrate-specific genes, myelin-related genes and limb development, and to the manuscript. S.K. contributed to analyses of GC content, assembly completeness, structure and conserved synteny, coordinated the manuscript, and wrote and edited J.J.S. developed the assembly, coordinated analyses, performed analyses of genome of(J.P.R.)). (312221 Council Research Canada Health (J.P.R.))(MOP74667 Research and the Naturaland Engineering Sciences Wellcome Trust (P.F.) (WT095908 Institutesthe Canadian and ofWT098051), the Charles Evans 2471 (S.A.S.)), AwardResearch (O.E.B., J.D.B. and J.R.M.), the Contribution HampshireNumberStation Experiment (Scientific Agricultural (L. Holland); IBN-0208138 and (M.D.Y.)),IOS-1126998 (S.A.S.); the New (C.J.T.)), the National Foundation (C.T.A.); Science (MCB-0719558 IOS-0849569 (M.D.Y.); and (C.T.A.);GM090049 RR014085 GM079492, and R37HD032443 (J.J.S.);T32HG00035 (O.E.B.); R03NS078519 (M.E.B.); DE017911 R01HG004694 including grants from the US National Institutes ofand Health (F32GM087919 commission (W.L.). was funding Partial provided sources, byadditional several National Institutes of (W.L.)) Health (R24GM83982 and the Great Fisheries Lakes (R.K.W.)).(U54HG003079 Additional support was provided by grants from the US genome project bywas funded the National Human Genome InstituteResearch important work that could not be cited owing to limitations. space The lamprey A AUTHOR CONTRIBUTIONS cknowledgments 2 2 3 2 , , , 3 and SRX110030. SRX110025. SRX109768. , , a DVANCE ONLINE PUBLICATION ONLINE DVANCE SRX110035. SRX10976 . 2 2 3 , , , SRX11003 SRX11002 SRX10976 4. 2 3 . Additional information is provided in , , SRX109765. 1. 6. 9. 2 2 3 , , , SRX110032. SRX110027. SRX109770. onlin e e version of the pape

3 Nature Ge Nature , , 2 2 3 SRX1 , , , SRX109761. SRX1 SRX1 SRX1

09766.

10033. 10028. 10023. n r etics . 3 3 2 2 2 , , , , , © 2013 Nature America, Inc. All rights reserved. Center Center for Genome Technology, Norman, Oklahoma, USA (S.W.C.). should Correspondence be addressed to W.L. ( of Molecular Medicine, University of Oxford, Oxford, UK (T.S.-S.), Institute of Microbiology, Chinese Academy of Sciences, Beijing, China (B.Z.) and The Advanced Missouri, USA. of Iowa, Iowa City, Iowa, USA. USA. Michigan, USA. and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA. Hinxton Cambridge, UK. 26 of Genetics, Harvard Medical School, Boston, USA. Massachusetts, Center, Emory University, Atlanta, Georgia, USA. Toronto, Ontario, Canada. 19 of Medicine, Kansas City, Kansas, USA. Biology, The University of Kansas School of Medicine, Kansas City, Kansas, USA. Mount Sinai School of Medicine, New York, New York, USA. of Medicine, New York, New York, USA. 11 9 7 Institute of Human Genetics, University of Utah, Salt Lake City, Utah, USA. Resource and Analysis Unit, Center for Biology,Developmental RIKEN, Kobe, Japan. 1 9. 8. 7. 6. 5. 4. 3. 2. 1. reprints/index.ht at online available is information permissions and Reprints The authors declare no competing interests.financial the assembly, and contributed to the development of the manuscript. sequencing project. W.L. provided coordination of the consortium and analysis of the development of the manuscript. R.K.W. provided of supervision the genome M.E.B. contributed to the conception of the lampreysea genome project and cDNA sequencing projects. P.F. the Ensembl annotationsupervised efforts. S.W.C. contributed to sequencing project management. B.T. coordinated the the BAC library used for genome sequencing and assembly. L.L.F., W.C.W. and to the development of neurodevelopment-related text. P.J.d.J. and B.Z. generated libraries, and evaluated the first draft assembly of the genome. B.F. contributed sequencing. Z.P. provided lamprey RNAleukocyte and cDNA samples and S.V.L., C.-Y.Y. and D.W.M. contributed to next-generation transcriptome RNA. C.T.B. and K.G.N. performed transcriptome assemblies. W.L., Y.-W.C.-D., protocol for the preparation of cDNA. N.M. and P.G. provided isolated leukocyte Nature Ge Nature COMPETING FINANCIAL INTERESTS FINANCIAL COMPETING The Hofstra North Shore–Long Island Jewish (LIJ) School of Medicine, Hempstead, New York, USA. Department Department of Michigan Horticulture, State University, East Lansing, Michigan, USA. Department of Biology, University of Kentucky, Lexington, Kentucky, USA.

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. Department of Biology, University of Washington, Seattle, Washington, USA. Department of Genetics and Genomics Sciences, Mount Sinai School of Medicine, New York, New York, USA. ho S Gn dpiain n te nqees f etbae eoe circa genomes vertebrate of uniqueness the and duplication Gene S. Ohno, G.K. McEwen, B. Venkatesh, A. Woolfe, B.L. Cantarel, D.B. Jaffe, of loss Programmed C.T. Amemiya, & E.E. Eichler, F., Antonacci, J.J., Smith, of consequences Genetic C.T. Amemiya, & E.E. Eichler, C., Baker, developmental J.J., Smith, in crossroads Evolutionary P.C. Donoghue, & S.M. Shimeld, 1970–1999. vertebrates. in development with Science elements. non-coding conserved genomes. organism model emerging 2. Arachne (2009). 11212–11217 genome. vertebrate a from pairs base of millions rearrangement. genome programmed hagfish). and (2012). (lamprey cyclostomes biology: 33 Department Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, USA.

314 n Genome Res. Genome t al. et hs ok s iesd ne a raie omn Attribution- Commons Creative a under licensed is work This oCmeca-hr Aie . Upre Lcne T ve a copy a view To visit License. license, this of Unported 3.0 Alike NonCommercial-Share etics t al. et 37 , 1892 (2006). 1892 , m 31 Semin. Cell Dev. Biol. Dev. Cell Semin. et al. et Present Present addresses: Ontario Institute for Cancer Research, Informatics and Toronto,Bio-Computing, Ontario, Canada (C.H.), The Weatherall Institute l et al. et . Department Department of Zoology, University of Oklahoma, Norman, Oklahoma, USA. t al. et ODR a aaae eore f eeomnal associated developmentally of resource database a CONDOR: hl-eoe eune seby o mmain genomes: mammalian for assembly sequence Whole-genome

Ancient noncoding elements conserved in the human genome. human the in conserved elements noncoding Ancient Early evolution of conserved regulatory sequences associated sequences regulatory conserved of evolution Early AE: n ayt-s antto ppln dsge for designed pipeline annotation easy-to-use an MAKER: ADVANCE ONLINE PUBLICATION ONLINE ADVANCE 28 21 Department Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, USA.

Department Department of Medical Biophysics, University of Toronto, Sunnybrook Research Institute, Toronto, Ontario, Canada. 13 35 , 91–96 (2003). 91–96 , Children’s Hospital Oakland, Oakland, California, USA. http://creative BMC Dev. Biol. Dev. BMC PLoS Genet. PLoS

10 Genome Res. Genome Curr. Biol. Curr. 13 18 , 517–522 (1999). 517–522 , Department Department of Mount Neuroscience, Sinai School of Medicine, New York, New York, USA. Center Center for Molecular and Comparative Endocrinology, University of New Hampshire, Durham, New Hampshire, USA. commons.org/licenses

5

23

, e1000762 (2009). e1000762 , 22 Development 7 Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc.

Department Department of Pathology and Laboratory Medicine, Emory University, Atlanta, Georgia, USA. , 100 (2007). 100 , 18 , 1524–1529 (2012). 1524–1529 , , 188–196 (2008). 188–196 , http://www.nature.c 15

Stowers Stowers Institute for Medical Research, Kansas City, Missouri, USA. 139 2091–2099 , /by-nc-sa/3.0 25 Medical Medical Research Council (MRC) National Institute for Medical Research, London, UK. 2 6

Benaroya Benaroya Research Institute at Virginia Mason, Seattle, Washington, USA. 106 Division Division of Biology, California Institute of Technology, Pasadena, California, USA. om/

20 / , Department Department of Immunology, University of Toronto, Sunnybrook Research Institute,

17 30 Department Department of Pathology and Laboratory Medicine, University of Kansas School Department Department of Fisheries & Wildlife, Michigan State University, East Lansing, 4 21. 20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 23. 22. 10. 27. 26. 25. 24. Department Department of Biology, University of Konstanz, Konstanz, Germany. 8 The The Feinstein Institute for Medical Research, Manhasset, New York, USA. 36

Freitas, R., Zhang, G. & Cohn, M.J. Evidence that mechanisms of fin development fin of mechanisms that Evidence M.J. Cohn, & G. Zhang, R., Freitas, P.Guo, recognition immune adaptive cell of Evolution C.T. Amemiya, Schwann & J. Smith, in N.R., Saha, networks signaling Nrg1/ErbB C. Birchmeier, & J. Newbern, P.Flicek, family tetraspan Myelin R.K. Campbell, & E. Gilland, H.G., Morrison, R.M., Gould, lamprey both sheaths: myelin of T.H.,Evolution Bullock, R.D. Fields, & J.K. Moore, a APOBEC4, E.V. Koonin, & Y.I. Pavlov, I.K., Jordan, M.K., Basu, I.B., Rogozin, of theory genetic a synthesis: evolutionary expanding an and Evo-devo S.B. Carroll, ( amphibian model a from data order Gene S.R. Voss, & J.J. Smith, comparative and Sequence Consortium. Sequencing Genome Chicken International Lettice, L.A. function hedgehog Sonic N.H. Shubin, W.N.& Pappano, M.C., Davis, R.D., Dahn, the toduplications relativegenomeTiming of S.Kuratani, & Meyer,A. S., Kuraku, Nikitina, N., Bronner-Fraser, M. & Sauka-Spengler, T. The sea lamprey sea Sauka-Spengler,T.The & Bronner-Fraser, M. N., Nikitina, regulatory early Dissecting M. Bronner-Fraser, & T. Sauka-Spengler, N., Nikitina, T.Sagai, a of Elimination T. Shiroishi, & M. Tamura,Y., Mizushina, M., Hosoya, T., Sagai, in chondrichthyan fins and the evolution of appendage patterning. ofappendage evolution the and fins chondrichthyan in vertebrates. early of midline the in evolved (2009). 796–801 vertebrates. jawless in myelination. and ascidian development the in present are proteins family non-tetraspan ( no but proteins myelin. lack hagfish and analysis. computational by predicted new member of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases evolution. morphological (2006). 219 evolution. and structure genome vertebrate on perspectives new Nature analysis of the chicken genome provide unique perspectives on vertebrate evolution. (2009). 47–59 311–314 (2007). 311–314 originofthe vertebrates: did cyclostomes diverge before orafter? Krumlauf, E.E.) 405–429 (CSHL Press, Cold Spring Harbor, New York, 2009). York, New Harbor, Spring Cold Press, (CSHL 405–429 E.E.) Krumlauf, Manual Laboratory A Organisms: marinus 105 network. gene crest neural lamprey the in relationships ( hedgehog Sonic limb. mouse the of (2005). truncation and expression long-range polydactyly. with preaxial (2003). 1725–1735 is associated and fin and limb Ciona intestinalis Ciona The The Genome Institute, Washington University School of Medicine, St. Louis, , 20083–20088 (2008). 20083–20088 , 32 27

Department Department of Biology, Kalamazoo College, Kalamazoo, Michigan, 432 et al. et : a model for evolutionary and . in biology. developmental and evolutionary for model a : European Bioinformatics European Institute, Bioinformatics Wellcome Trust Genome Campus, et al. et et al. et 10 , 695–716 (2004). 695–716 , cis et al. Dual nature of the adaptive immune system in lampreys. in system immune adaptive the of nature Dual Marine Biological Laboratory, Woods Hole, Massachusetts, USA. Phylogenetic conservation of a limb-specific, a of conservation Phylogenetic rgltr mdl cue cmlt ls o limb-specific of loss complete causes module -regulatory Ensembl 2011. Ensembl Shh A long-range ) genome. ) ). Semin. Immunol. Semin. Mamm. Genome Mamm. 12 Cell Neurosci. Lett. Neurosci. [email protected] Department Department of Psychiatry, Mount Sinai School Biol. Bull. Biol.

134 Semin. Cell Dev. Biol. Dev. Cell Semin. Shh Nucleic Acids Res. Acids Nucleic o. (d. erne, .. Jhsn AD & A.D. Johnson, R.R., Behringer, (eds. 1 Vol. , 25–36 (2008). 25–36 , enhancer regulates expression in the developing Cell Cycle Cell

209

15 48 22 34 ) ) or J.J.S. ( , 23–34 (2004). 23–34 , , 145–148 (1984). 145–148 , Nature , 25–33 (2010). 25–33 , , 49–66 (2005). 49–66 , 16 Department Department of Biology, University 14 Department Department of Anatomy & Cell

Friedman Friedman Brain Institute, 4 , 1281–1285 (2005). 1281–1285 , 29

442

Development Department Department of Microbiology 39

21 [email protected] , D800–D806 (2011). D800–D806 , Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. , 1033–1037 (2006). 1033–1037 , , 922–928 (2010). 922–928 , 24 22 s e l c i t r A u. o. Genet. Mol. Hum. Department Department Emory Emory Vaccine cis Mol.Biol. Evol. -acting regulator of regulator -acting BMC Genomics BMC

3 Emerging Model Emerging 132 Genome Genome 5 Ambystoma Nature Eccles Eccles Nature 797–803 , Petromyzon ).

459 445

Shh 12 26

7  ): , , , , ,

© 2013 Nature America, Inc. All rights reserved. Gene annotation. Gene elements. transposable known other to similar elements identify to URLs) (see 14.12 Repbase in classes repeat known to homology for searched additionally were putative elements and the possible presence of target site between duplications. Repeats boundaries possible of presence the for examined output resulting class. Recovered weresequences then using aligned 2 dialign (ref. BLASTX, respectively; or (BLASTN level or protein at nucleotide the transposon to a known be similar to found was sequence lamprey particular a If recovered. were sequence and at least ten hits (BLASTN aries. Each sequence was searched against the sea lamprey genomic sequences, were curated further to their verify identity, and individuality 5 URLs) see (v1.06; RECON with Characterization of repetitive sequences. from a clone. whole-genome shotgun sequence linkage paired-end single a finally, and, clone large-insert a from linkage end large-insert clones or whole-genome shotgun sequence clones, a single paired- two paired-end linkages from large-insert clones, two paired-end linkages from clones, sequence shotgun whole-genome or clones large-insert from linkages paired-end three clones, large-insert from linkages paired-end three clones, paired-end linkages from large-insert clones or whole-genome shotgun sequencethreshold: at least four paired-end linkages from large-insert clones, stringent more at a at least fouridentified previously been had that linkages supplant not tion was incorporated in the following steps, where subsequent linkages might informa mapping paired-end Specifically, clones). large-insert versus reads linkages between contigs and the source of end-mapping information (shotgun module in a ofseries steps that prioritized the number of end reads supporting ExtendHaploSupers the via End-mappingtion. incorporated information was mapping information from end-read informa linkage using joined were then that the merger ofpermitted divergent and URLs), haplotypes haplotypes (see settings liberal with parameterized module, Rebuilder the using together bled Assemblez module, contigs corresponding to divergent haplotypes were assem bly of an outbred diploid genome ( total of ~19 million sequence reads with Arachne assembly. Genome 99%. and 97% 95%, 90%, of thresholds identity sequence using estimated were depth coverage of Distributions sequences. aligning between identity nucleotide percent for thresholds varying with but above, described as effort, sequencing shotgun whole-genome entire the and reads of subset a estimated the depth of coverage by processing Megablast performed were sets data Megablast using sequence shotgun whole-genome lamprey and with phred from the NCBI Trace were downloaded in format Archives and .scf processed genome 12.1 shotgun whole-genome million reads All sequence (Q20 reads trimmed). human single a from reads sequence shotgun whole-genome trimmed 10,000 aligning by performed also analy was sis complementary A trimmed). (Q20 reads whole- million sequence 18.5 shotgun of genome set data complete the to these bp aligning (>500 and reads Q20) at sequence shotgun high-quality 10,000 of subset a by selecting performed was genome the of content repetitive the of characterization Initial assembler. the of selection the to as insight provided These assembly. analyses. Preassembly described previously as sequenced and trays growth of wells the into individually arrayed vectors, bacterial into previously BAC of described was CHORI-303 Production library Lakes. Great the from captured lamprey female single a from dissected liver a from derived was libraries BAC and fosmid and ing sequencing. Genome ONLINE Nature Ge Nature ated using the automated genome MAKER annotation pipeline To gain insight into the potential influence of allelic polymorphism, we we polymorphism, allelic of influence potential the into insight gain To

33 METHODS n , 3 4 etics to generate base calls and quality scores. Alignments to human 3 4 5 0 . Annotations for Annotations the lamprey genome were assembly gener , using RepeatMasker and BLAST (BLASTX (BLASTX BLAST and RepeatMasker using , Assembly of the lamprey genome was performed using a using performed was genome lamprey the of Assembly Sea lamprey Sea DNA for sequenc shotgun whole-genome E Several analyses were performed before initiating the the initiating before performed were analyses Several < 1 × 10 3 8

−5 E 3 7 < 1 × 10 Supplementary Note ; RepBase14.12), it was assigned to that repeat , with a cutoff of ten copies, and sequences sequences and copies, ten of cutoff a with , 11 , 29– 3 1 −10 . Repetitive sequences were collected ) ) plus 100 bp of 3 2 3 8 6 3 . Other libraries were cloned cloned were libraries Other . parameterized for the assem 2 to a complete data set of of set data complete a to ). After assembly by the 3 5 alignments between ′ and 5 ′ 5 and 3 , which aligns , aligns which E 3 < 1 × 10 × 1 < 9 ), ), with the ′ flanking flanking ′ bound −5 ------)

the the 10-kb fragment harboring the gene(s). To investigate the possible influence calculated for each protein-coding gene, and the GC content was calculated for GC were content and GC content positions at Overall codon third sequences. coding 18,444 of total a leaving content, GC of analyses from excluded were To avoid any than shorter 300 bp sequences bias by imparted sequences, small for gene. each variant coding but transcript all longest the discarding after predicted sequences using performed was genes lamprey in composition acid usage. Codon 1. of penalties with extension and and existence gap 5 to set size word the with used was (2.2.25+) BLASTN genomes. Fugu and human between parisons in com identified previously sequences noncoding homologous to conserved CNEs. of Identification ( cufflinks and tophat programs by the processed SNAP Ab initio in the NCBI protein database (hagfishes) and Myxinidae fishes) laginous Swiss-Prot proteins for 14 metazoans ( a repeat species-specific library and protein databases containing all annotated Inputs for MAKER included the statistics. control quality with along models gene downstream final produce initio ab produces repeats, identifies evidence, homology protein and EST filters and and were pooled and sequenced by 454 sequencing. by 454 sequenced and pooled were and to hybridization by selected BACswere accession (GenBank lamprey transcript known a to hybridization via BACs of series a selected Hox genes. analysis. any pairwise in compared being species two the of cations) and homology groups that contained more than six homologs in either redundant copies of gene tandemly duplicated amplifi genes (lineage-specific throughout the genome and present at relatively low copy number by removing were limited to genes single-copy and duplicates that were broadly distributed have that duplicates some undergoneidentify exceedingly rapidnot diversification after may duplication. Second, but analyses genome the to uniformly applied be can and rate divergence the in variation some permits convention This to of human the or gene lamprey genomes). models comparison chicken ment (TBLASTN score (bit 90% alignment of within score) the top-scoring or align an genomes two the between alignment highest-scoring the yielded genomics. Comparative gnathostomes. and lamprey the the in of families gene evolution the CAFE package The software trees. neighbor-joining three and maximum-likelihood two using family each for tree consensus a struct built for each cluster using MCoffee similarity sequence their to according compara database, Build 64 (ref.Ensembl the and pipeline reconstruction tree Ensembl the using performed was and 3 outgroups 2 additional genomes, vertebrate 50 including Phylogenetic analysis of lamprey genes. calculated analysis. we correspondence a performed and values RSCU sequences, protein-coding concatenated species-by-species invertebrates and vertebrates Ensembl diverse from for sequences genome- protein-coding downloaded we wide species, other to relative regions ­protein-coding URLs). (see values RSCU on (COA) analysis correspondence performed we composition, acid on genes of the basis RNA-seq Toreads. bias usage and codon amino- analyze expressed lowly 50 and expressed highly 50 of content GC the compared we composition, amino-acid and bias usage codon on levels expression gene of To assess the possible deviation of the sequence properties of lamprey lamprey of properties sequence the of deviation possible the assess To 4 4 4 8 and Augustus and and on amino-acid composition values using the software CodonW gene predictions, infers 5 infers predictions, gene gene predictions were produced inside of MAKER by the programs programs the by MAKER of inside produced were predictions gene 4 To supplement the assembly of –containing regions, we regions, of Hox gene–containing To assembly the supplement 1 protein database and all sequences for Chondrichthyes (carti Chondrichthyes for sequences all and database protein Genome-wide assessment of codon usage bias and amino- and bias usage codon of assessment Genome-wide 1 7 and the archives for individual genome projects. Using Using projects. genome individual for archives the and 4 5 . MAKER was also passed passed also was MAKER . The lamprey assembly was searched for sequences sequences for searched was assembly lamprey The Regions were considered putative orthologs if they they if orthologs putative considered were Regions Supplementary Supplementary Note 5 P. marinus 0 ). All genes were clustered with hcluster_sg ′ 5 and 3 and 3 4 , , and TreeBeST 7 5 and human and and human and 2 . A multiple-sequence alignment was was alignment multiple-sequence A . A genome-wide phylogenetic analysis Hox4 ′ UTRs and integrates these data to to data these integrates and UTRs genome assembly, or or A ) ) combined with the Uniprot/ Hox2 Hox9 Y Supplementary Note Supplementary 4 P.marinus 5 9 1 7 was then used to recon probe designed from a from designed probe homeodomain probes probes homeodomain 3 1 Callorhinchus milii Callorhinchus 4 doi:10.1038/ng.2568 5 ). Another series of series ). Another 4 was used to was used study P. marinus RNA-seq data data RNA-seq ) ESTs, 4 42 6 . , 4 3 4 5 3 8 1 9 6 - - - - - . ;

© 2013 Nature America, Inc. All rights reserved. 38. 37. 36. 35. 34. 33. 32. 31. 30. 29. 28. mVISTA with compared were lamprey the The (ref. MEGA5 in trees were members of gene neighbor-joining constructed individual families, families. gene Immunity-related no of score 50. than bit less a with homolog its or itself search sequence BLASTP query reciprocal starting a the from was hit best the if valid as regarded were 60 and 50 between scores bit with Candidates genes. vertebrate-specific didate homologs in lamprey tifiable but not in any invertebrate were can considered mansoni all peptides predicted in the genomes of as well as invertebrates, for Ensembl and GenBank in available sequences all included database invertebrate This sequences. peptide invertebrate against search in a BLASTP as query bit 50 of were used than score no less a maximal version ref. 58; (Ensembl species gnathostome all of peptides to aligned were tides genes. vertebrate-specific of Identification doi:10.1038/ng.2568

lshl S.F. Altschul, Automated Eddy,S.R. & Z. Bao, D.B. Jaffe, aligning for algorithm greedy A W. Miller, & L. Wagner, S., Schwartz, Z., Zhang, II. phred. using traces sequencer automated P.of Green, Base-calling & B. Ewing, Hillier,B., Ewing, Wendl,L., P.Green, & M.C. sequencer automated of Base-calling Levy,S. W.C.Warren, W.C.Warren, R.H. Waterston, K. Osoegawa, database search programs. search database genomes. sequenced in 2. Arachne sequences. DNA probabilities. Error assessment. Accuracy I. phred. using traces (2007). e254 evolution. genome. libraries. chromosome Shh 5 5 enhancer ShARE. enhancer 5 7 ) using BLASTP ) using and et al. et Nature Nature Genome Res. Genome t al. et Lottia Lottia gigantea The diploid genome sequence of an individualhuman. an of sequence genomediploid The et al. et et al. et 5 t al. et t al. et 8

420 t al. et

) using complete gap deletion. gap complete using ) J. Comput. Biol. Comput. J. 453 hl-eoe eune seby o mmain genomes: mammalian for assembly sequence Whole-genome The genome of a songbird. a of genome The Genome analysis of the platypus reveals unique signatures of signatures unique reveals platypus the of analysis Genome Genome Res. Genome apd LS ad S-LS: nw eeain f protein of generation new a PSI-BLAST: and BLAST Gapped n mrvd prah o cntuto o bceil artificial bacterial of construction for approach improved An , 520–562 (2002). 520–562 , , 175–183 (2008). 175–183 , nta sqecn ad oprtv aayi o te mouse the of analysis comparative and sequencing Initial Genomics Genome Res. Genome

3 13 8 The genomic sequences of jawed vertebrates and vertebrates jawed of sequences genomic The . All gnathostome peptide sequences that showed showed that sequences peptide . gnathostome All Nucleic Acids Res. Acids Nucleic , 91–96 (2003). 91–96 , 4 2 . All gnathostome query sequences with iden sequences . query All gnathostome

de novo de 52 8

7 , 186–194 (1998). 186–194 , To understand the relationships among among relationships the understand To , 203–214 (2000). 203–214 , , 1–8 (1998). 1–8 ,

12 Schistosoma japonicum identification of repeat sequence families sequence repeat of identification , 1269–1276 (2002). 1269–1276 , 5 9 Genome Res. Genome using the mouse as a reference. a as mouse the using

All All 25 Nature , 3389–3402 (1997). 3389–3402 , P.marinus

464

8 , 757–762 (2010). 757–762 , , 175–185 (1998). 175–185 , predicted pep predicted 5 6 , , Schistosoma PLoS Biol. PLoS

5 - - - ,

48. 47. 46. 45. 44. 43. 42. 41. 40. 39. 53. 52. 51. 50. 49. 59. 58. 57. 56. 55. 54.

hr, .. L, .. h Cdn dpain ne— maue f directional of measure Index—a Adaptation Codon The W.H. Li, & P.M. Sharp, of analysis Functional G. Elgar, & H. Callaway, G.K., McEwen, E.J., Kenyon, with junctions splice TopHat:discovering S.L. Salzberg, & L. Pachter,Trapnell, C., Stanke, M. genomes. novel in finding Gene I. Korf, Sequences: Reference NCBI D.R. Maglott, W.Tatusova,& T.,Klimke, K.D., Pruitt, O. Simakov, 2010. in (UniProt) Resource Protein Universal The Consortium. UniProt at alignment J. sequence Jurka, protein and DNA multiple DIALIGN: B. Morgenstern, Wallace, I.M., O I.M., Wallace, subsequences. molecular common of Identification M.S. T.F.Waterman, Smith, & J. Ruan, A.J. Vilella, in usage. codon of Analysis J.F. Peden, rzr KA, ahe, . Plao, . Rbn EM & uca, . VISTA: I. Dubchak, & E.M. Rubin, A., Poliakov, L., Pachter, K.A., Frazer, Evolutionary Molecular MEGA4: S. Kumar, & M. Nei, J., Dudley, K., Tamura, M. Berriman, T.J.Hubbard, tool computational a M.W.J.P.CAFE: Hahn, Demuth, & N., T., Cristianini, Bie, De synonymous codon usage bias, and its potential applications. potential its and bias, usage codon synonymous embryos. zebrafish stature short the around regions non-coding conserved RNA-Seq. Res. Acids initiatives. new (2009). and policy status, current Nature Res Acids Res. Genome Cytogenet. BiBiServ. Schistosoma japonicum Schistosoma utpe eune lgmn mtos ih T-Coffee. with methods alignment sequence multiple Biol. Mol. J. (2008). vertebrates. in trees phylogenetic 2000). 15 (2004). computational tools for comparative genomics. (2007). 4.0. version software (MEGA) Analysis Genetics 460 Nature The evolution. family gene of study the for (2006). 1692–1699 , 1281–1295 (1987). 1281–1295 , , 352–358 (2009). 352–358 , citsm japonicum Schistosoma

493 460

Nucleic Acids Res. Acids Nucleic t al. et t al. et Bioinformatics 38 et al. 34 , 526–531 (2013). 526–531 , , 345–351 (2009). 345–351 ,

et al. et 147 , D142–D148 (2010). D142–D148 , t al. et et al. et , W435–W439 (2006). W435–W439 , et al. et ebs Udt, dtbs o ekroi rpttv elements. repetitive eukaryotic of database a Update, Repbase rea: 08 Update. 2008 TreeFam: ′ AUGUSTUS: Sullivan, O., Higgins, D.G. & Notredame, C. M-Coffee: combining M-Coffee: C. Notredame, & D.G. Higgins, O., Sullivan, , 195–197 (1981). 195–197 , Insights into bilaterian evolution from three spiralian genomes. spiralian three from evolution bilaterian into Insights PLoS ONE PLoS The genome of the blood fluke blood the of genome The Ensembl 2009. Ensembl nebCmaa eere: opee duplication-aware complete, GeneTrees: EnsemblCompara Genome Sequencing and Functional Analysis Consortium. Analysis Functional and Sequencing Genome

110 25 , 1105–1111 (2009). 1105–1111 , eoe eel faue o hs-aaie interplay. host-parasite of features reveals genome

, 462–467 (2005). 462–467 , 32

ab initio 6 , e21498 (2011). e21498 , , W33–W36 (2004). W33–W36 , Genome Res. Genome Nucleic Acids Res. Acids Nucleic prediction of alternative transcripts. Bioinformatics BMC Bioinformatics BMC N Repair DNA uli Ais Res. Acids Nucleic uli Ais Res. Acids Nucleic Nucleic Acids Res.

19 Mol. Biol. Evol. Biol. Mol. , 327–335 (2009). 327–335 , Schistosoma mansoni Schistosoma

Uiest o Nottingham, of (University 22

hox 37 uli Ais Res. Acids Nucleic , 1269–1271 (2006). 1269–1271 , Nature Ge Nature , D690–D697 (2009). D690–D697 , ee ( gene

5 , 59 (2004). 59 , Nucleic Acids Res. Acids Nucleic

36 32 24

shox 37 D735–D740 , , W273–W279 , 1596–1599 , D32–D36 , i whole in ) n . Nucleic Nucleic etics Nature

34 ,