LETTER OPEN doi:10.1038/nature14668

The octopus genome and the evolution of cephalopod neural and morphological novelties

Caroline B. Albertin1*, Oleg Simakov2,3*, Therese Mitros4, Z. Yan Wang5, Judit R. Pungor5, Eric Edsinger-Gonzales2,4, Sydney Brenner2, Clifton W. Ragsdale1,5 & Daniel S. Rokhsar2,4,6

Coleoid cephalopods (octopus, squid and cuttlefish) are active, 97% of expressed -coding and 83% of the estimated resourceful predators with a rich behavioural repertoire1. They 2.7 gigabase (Gb) genome size (Methods and Supplementary Notes have the largest nervous systems among the invertebrates2 and 1–3). The unassembled fraction is dominated by high-copy repetitive present other striking morphological innovations including cam- sequences (Supplementary Note 1). Nearly 45% of the assembled gen- era-like eyes, prehensile arms, a highly derived early embryogenesis ome is composed of repetitive elements, with two bursts of transposon and a remarkably sophisticated adaptive colouration system1,3.To activity occurring ,25-million and ,56-million years ago (Mya) investigate the molecular bases of cephalopod brain and body (Supplementary Note 4). innovations, we sequenced the genome and multiple transcrip- We predicted 33,638 protein-coding genes (Methods and Supple- tomes of the California two-spot octopus, Octopus bimaculoides. mentary Note 4) and found alternate splicing at 2,819 loci, but no locus We found no evidence for hypothesized whole-genome duplica- showed an unusually high number of splice variants (Supplementary tions in the octopus lineage4–6. The core developmental and neur- Note 4). A-to-G discrepancies between the assembled genome and onal repertoire of the octopus is broadly similar to that found transcriptome sequences provided evidence for extensive mRNA edit- across invertebrate bilaterians, except for massive expansions in ing by adenosine deaminases acting on RNA (ADARs). Many candid- two gene families previously thought to be uniquely enlarged in ate edits are enriched in neural tissues7 and are found in a range of gene vertebrates: the protocadherins, which regulate neuronal develop- families, including ‘housekeeping’ genes such as the tubulins, which ment, and the C2H2 superfamily of zinc-finger transcription fac- suggests that RNA edits are more widespread than previously appre- tors. Extensive messenger RNA editing generates transcript and ciated (Extended Data Fig. 1 and Supplementary Note 5). protein diversity in genes involved in neural excitability, as prev- Based primarily on number, several researchers pro- iously described7, as well as in genes participating in a broad range posed that whole-genome duplications were important in the evolu- of other cellular functions. We identified hundreds of cephalopod- tion of the cephalopod body plan4–6, paralleling the role ascribed to the specific genes, many of which showed elevated expression levels in independent whole-genome duplication events that occurred early in such specialized structures as the skin, the suckers and the nervous vertebrate evolution11. Although this is an attractive framework for system. Finally, we found evidence for large-scale genomic rear- both expansion and increased regulatory complexity rangements that are closely associated with across multiple genes, we found no evidence for it. The gene family expansions. Our analysis suggests that substantial expansion of a expansions present in octopus are predominantly organized in handful of gene families, along with extensive remodelling of gen- clusters along the genome, rather than distributed in doubly conserved ome linkage and repetitive content, played a critical role in the synteny as expected for a paleopolyploid12,13 (Supplementary Note 6.2). evolution of cephalopod morphological innovations, including Although genes that regulate development are often retained in multiple their large and complex nervous systems. copies after paleopolyploidy in other lineages, they are not generally Soft-bodied cephalopods such as the octopus (Fig. 1a) show remark- expanded in octopus relative to limpet, oyster and other invertebrate able morphological departures from the basic molluscan body plan, bilaterians11,14 (Table 1 and Supplementary Notes 7.4 and 8). including dexterous arms lined with hundreds of suckers that function Hox genes are commonly retained in multiple copies following as specialized tactile and chemosensory organs, and an elaborate chro- whole-genome duplication15.InO. bimaculoides, however, we found matophore system under direct neural control that enables rapid only a single Hox complement, consistent with the single set of Hox changes in appearance1,8. The octopus nervous system is vastly modi- transcripts identified in the bobtail squid Euprymna scolopes with fied in size and organization relative to other molluscs, comprising a PCR16. Remarkably, octopus Hox genes are not organized into clusters circumesophageal brain, paired optic lobes and axial nerve cords in as in most other bilaterian genomes15, but are completely atomized each arm2,3. Together these structures contain nearly half a billion (Extended Data Fig. 2 and Supplementary Note 9). Although we can- neurons, more than six times the number in a mouse brain2,9. Extant not rule out whole-genome duplication followed by considerable gene coleoid cephalopods show extraordinarily sophisticated behaviours loss, the extent of loss needed to support this claim would far exceed including complex problem solving, task-dependent conditional dis- that which has been observed in other paleopolyploid lineages, and it is crimination, observational learning and spectacular displays of cam- more plausible that chromosome number in coleoids increased by ouflage1,10 (Supplementary Videos 1 and 2). chromosome fragmentation. To explore the genetic features of these highly specialized animals, Mechanisms other than whole-genome duplications can drive we sequenced the Octopus bimaculoides genome by a whole-genome genomic novelty, including expansion of existing gene families, evolu- shotgun approach (Supplementary Note 1) and annotated it using tion of novel genes, modification of gene regulatory networks, and extensive transcriptome sequence from 12 tissues (Methods and reorganization of the genome through transposon activity. Within Supplementary Note 2). The genome assembly captures more than the O. bimaculoides genome, we found evidence for all of these

1Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois 60637, USA. 2Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 9040495, Japan. 3Centre for Organismal Studies, University of Heidelberg, 69117 Heidelberg, Germany. 4Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA. 5Department of Neurobiology, University of Chicago, Chicago, Illinois 60637, USA. 6Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA. *These authors contributed equally to this work.

220 | NATURE | VOL 524 | 13 AUGUST 2015 G2015 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

a b PF08266/00028 Cadherin PF05375 Pacifastin_I PF02868/01447 Peptidase_M4 PF02037 SAP PF06083 IL17 PF00002 7tm_2 PF07690 MFS_1 PF14830 Haemocyan_bet_s PF13465/00096 zf-C2H2 PF05970 PIF1 PF00264 Tyrosinase PF00582 Usp PF00147 Fibrinogen_C PF00024 PAN_1 PF01582 TIR PF00092 VWA PF01531 Glyco_transf_11 PF02931 Neur_chan_LBD PF01607 CBM_14 PF05485 THAP Bfl Xtr Lgi Cgi Cel Pfu Obi Cte Dre Hro Lch Hsa Gga Dme Mmu –2 0 2 Row Z-score

Figure 1 | Octopus anatomy and gene family representation analysis. lophotrochozoans (green) and molluscs (yellow), including O. bimaculoides a, Schematic of Octopus bimaculoides anatomy, highlighting the tissues (light blue). For a domain to be labelled as expanded in a group, at least 50% of sampled for transcriptome analysis: viscera (heart, kidney and its associated gene families need a corrected P value of 0.01 against the outgroup hepatopancreas), yellow; gonads (ova or testes), peach; retina, orange; optic average. Some Pfams (for example, Cadherin and Cadherin_2) may occur lobe (OL), maroon; supraesophageal brain (Supra), bright pink; subesophageal in the same gene, however multiple domains in a given gene were counted brain (Sub), light pink; posterior salivary gland (PSG), purple; axial nerve cord only once. Bfl, Branchiostoma floridae; Cel, Caenorhabditis elegans; Cgi, (ANC), red; suckers, grey; skin, mottled brown; stage 15 (St15) embryo, Crassostrea gigas; Cte, Capitella teleta; Dme, Drosophila melanogaster; Dre, aquamarine. Skin sampled for transcriptome analysis included the eyespot, Danio rerio;Gga,Gallus gallus;Hsa,Homo sapiens;Hro,Helobdella robusta; shown in light blue. b, C2H2 and protocadherin domain-containing gene Lch, Latimeria chalumnae; Lgi, Lottia gigantea;Mmu,Mus musculus; Obi, O. families are expanded in octopus. Enriched Pfam domains were identified in bimaculoides;Pfu,Pinctada fucata;Xtr,Xenopus tropicalis. mechanisms, including expansions in several gene families, a suite and octopus protocadherin arrays arose independently. Unlinked of octopus- and cephalopod-specific genes, and extensive genome octopus protocadherins appear to have expanded ,135 Mya, after shuffling. octopuses diverged from squid. In contrast, clustered octopus proto- In gene family content, domain architecture and exon–intron cadherins are much more similar in sequence, either due to more structure, the octopus genome broadly resembles that of the limpet recent duplications or as found in clustered proto- Lottia gigantea17, the polychaete annelid Capitella teleta17 and the cadherins in zebrafish and mammals21. cephalochordate Branchiostoma floridae14 (Supplementary Note 7 The expression of protocadherins in octopus neural tissues (Fig. 2) is and Extended Data Fig. 3). Relative to these invertebrate bilaterians, consistent with a central role for these genes in establishing and main- we found a fairly standard set of developmentally important trans- taining cephalopod nervous system organization as they do in verte- cription factors and signalling pathway genes, suggesting that the brates. Protocadherin diversity provides a mechanism for regulating evolution of the cephalopod body plan did not require extreme expan- the short-range interactions needed for the assembly of local neural sions of these ‘toolkit’ genes (Table 1 and Supplementary Note 8.2). circuits18, which is where the greatest complexity in the cephalopod However, statistical analysis of protein domain distributions across nervous system appears2. The importance of local neuropil interac- animal genomes did identify several notable gene family expansions tions, rather than long-range connections, is probably due to the limits in octopus, including protocadherins, C2H2 zinc-finger placed on axon density and connectivity by the absence of myelin, as (C2H2 ZNFs), interleukin-17-like genes (IL17-like), G-protein- thick axons are then required for rapid high-fidelity signal conduction coupled receptors (GPCRs), chitinases and sialins (Figs 1b, 2 and 3; over long distances. The sequence divergence between octopus and Extended Data Figs 4–6 and Supplementary Notes 8 and 10). The octopus genome encodes 168 multi-exonic protocadherin Table 1 | Metazoan developmental control genes genes, nearly three-quarters of which are found in tandem clusters on the genome (Fig. 2b), a striking expansion relative to the 17–25 genes found in Lottia, Crassostrea gigas (oyster) and Capitella gen- Obi Lgi Cte Dme Cel Bfl Hsa omes. Protocadherins are homophilic cell adhesion molecules whose Ligands function has been primarily studied in mammals, where they are Fibroblast growth factor 3 2 1 3 3 8 22 required for neuronal development and survival, as well as synaptic Wnt 12 10 12 7 5 17 19 specificity18. Single protocadherin genes are found in the invertebrate TGFb/BMP 12 9 14 6 5 22 33 deuterostomes Saccoglossus kowalevskii (acorn worm) and Strongylo- Delta/Jagged 4 1 1 2 4 2 7 Hedgehog 1 1 1 1 0 1 3 centrotus purpuratus (sea urchin), indicating that their absence in Axon guidance 10 9 9 6 8 23 33 Drosophila melanogaster and Caenorhabditis elegans is due to gene Transcription factors loss. Vertebrates also show a remarkable expansion of the protocad- C2H2 zinc-finger 1,790 413 222 326 211 1,338 764 herin repertoire, which is generated by complex splicing from a clus- Homeodomain 114 121 111 104 99 133 333 High mobility group 23 15 14 13 16 51 125 tered locus rather than tandem (reviewed in ref. 19). Helix loop helix 50 63 64 59 42 78 118 Thus both octopuses and vertebrates have independently evolved a Nuclear hormone receptor 40 44 45 16 274 33 48 diverse array of protocadherin genes. Fox 16 28 26 17 18 42 43 A search of available transcriptome data from the longfin inshore Tbox 9 9 7 8 21 9 18 squid Doryteuthis (formerly, Loligo) pealeii20 also demonstrated an Number of members of developmental ligand and transcription factor families from O. bimaculoides and selected other taxa. Dendrogram above species names reflects their evolutionary relationships. Bfl, expanded number of protocadherin genes (Supplementary Note Branchiostoma floridae; Cel, Caenorhabditis elegans; Cte, Capitella teleta; Dme, Drosophila melanogaster; 8.3). Surprisingly, our phylogenetic analyses suggest that the squid Hsa, Homo sapiens; Lgi, Lottia gigantea; Obi, O. bimaculoides.

13 AUGUST 2015 | VOL 524 | NATURE | 221 G2015 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

H

sa a _dac c

h

so

u

s U

_2_CRA _ 0

o 8

IX R *C

R _

5

A

Z 7 1

oRR _a_ - 09 _

1 3 A m Hsa 7 C A 4 H

1 H 8 L m

H p

r

1 T

m - t D Hsa_ E s Hsa g g 1 H Am h 3 7 e h s p o H C r2 1 r 2 VIII Hs a C n A i p H i a_ _ N _ 3 N d s h 54 H _ 5 _ C 5 h sa _ li h P CDH- X 1 4. a 8 1 4 r r v i s v 3 H 1743 sa P C 2 d C r _ 1 P_C p i d Hsa a_CDH- i dhr 4 _ P _ 0 _ 8 D r h - a_CD D 2 h s r 2_I _ CDH _ 18 8 03 1 h h P T C Cdh h 5 CD d C 494 C 3 1 C 0 1 d 3 CH d a h i C C i Hs _ C 7 2 6 H dh 2 _ r i r 4 p A h 9 0

H D D _ C C 2 R h 0p C 1 C 0 27 5_ D H 9 _ D DH a h C1 938-9_ s F 6 C 5 D r P 6 i 8 9 v 54. Hs 35 Cdh p H m -

H H- 0 a_ s 24_ S R s 4 D Cd 92 5 fl D 9 G 2 v 5 a S me H H- v phiC 4 0 h Hs o _ H-11 X 5 I a 9 - o v 1 4_ H N am A 4 5 - . H _ H m A Hsa N 6 C H Cd 28 1 - N NvHedg a flam T0 sa 1 1 p 54 L _ 8 _ 23 C Nv m 6 5 3 4 7 1 1 -2 8 N v A a - 1 9_ A 64 3- 2 . Hs _ C D 1 1 5 r _ g Amph e _ 1 9 1 A F _ i _C D 9 i _1 8 -92 0 sta 8_ N t 1 9 r1 B n i DH H- Lgi_1 i h 3 CDH_ 8 2 75 3_ l C _ 1 86 a 4_ _ _ H Am 8 6 a g C 2 h 2 A i N _ 2 i_ 0 _CD n t 1 CDH- 2 mi o o L D Lg gi_ 2 _2 r - 1 Lg A 45 go_ C8 e r vC 6 1 g 2 Cd _ R g - 1 L i_22 69 h 2 b ry H 8 2 H _1825 1 1 L 2 ng 3_ i 2 g iCd d 2 A R oRR hi H_ _ H_E E E5 _ E 95 -6 0 gi_ 1 _ L h H- 4 m 2 2 L RR C AA8 2a o G 2 EA n L CD i a 26 SR 9 D _ 3 3 o CD c _ F_ p L3DKX i mp H- -4_ Dsc G A 1 ght Cte_ - 1 1_ mp C _ c3 W h _ A ph H - F_LA L A 9 94 sa_ _ D 2 Ds s i C A AA A 64914. CELSR 9 _N D H _ H F H m 3 5 D 10 sa 5 n- s G_ 6 5 a C - _ 1 i _ A CDH_C l a 1930. H s a_C H -1 3 l 3 1 G G X n-2 _ 7p s a_ 1 3_ i - _a 0 H _CD D H o l VII PC _7 s l 2 00 H a _E- c 26 Ct gr H- -1 o llin Hs C pG 1 H a o 4_ D H o in-4_ - n_ C t e 80. _ 1 Hs s i e _CD D m n 2 a_F 2 H- R 1 H sa_C CD s i - e C t _ a _ e_1 2 _ e moc n t 2 1 H s _C s le i e 8 8 2 d moc e a 1 _ H a e s l A _ 8 8 _ og prot t 225 7 1 Hsa s d e 4_ m 7 5 a smogle o 8 2 H s _ d R 3 5 a _ e sm 1 Q p H mog ke_ R5 Cte_2 721 5 8 a d e li o 6V0I h _ s Dme_f i 4 Hs s 36_ R L F _d e H- 16_ R H sa d g A a D 22 i 7. _ H- C13E _ 2 T H a C 3 1 20 2 Hs s _ D dhr13 8 05 C un _ at H a C C 0 s _ _CX g 16 t 6 3 8 H phi 6 N L 2_W 8 ho v 2 3 Hsa m 4 9 1 N C T R5 _ A _s r1 7. vC d W R e e N h o h k o D D m _2174922404 d i m v r6 e 2 62 l 727 R N C H D iC R 3 R C e_C vC D 2 Ct e_ 83 7 1 te_ H t ph 8 4 D 3 C ELS 02 in 0 D H1 _1 C 0 n L _ 2 i 0 C 2 H Am te h 6 te gi_ 1 6 -N p n 6 C XP_ 4 7 5 y Nv 5 2 1 E 5 3 r1 ls C 6 4 4 Am 2 a FA 7 L2 _pp_ _ h c o te 0 s _ R Lgi _ T 5 gi 16141 8 R10 2 -like 3 L i_ 2 g hiCd 2 _C2 A _ 27 p 861 0 5 H m 3 1 L 2 9 sa p _ 0 4 0 C3 4 Am i_23 2 T0 II VI H _ h 4 5 _ 1_1 s P iF 3 87 g 8 21 a C A L L e_2 5 n- H _ D E4 t 0 1 ni 3 s PC H T C R 66 6 e a_ -l R 11 5 t P D _F i _ n nin- H a k o 81 y A C _ t e gi s te 1 m D F _ L _1 al p H a 2 te syn N in- h _F t_ C _c l T L i a 3 S ten g Cd t sa _ca L n A i_ _1 H a y m 1 h hiC ls p 54 r1 Hs C h 2 2 Ca o te iCdh 0 Amp R _ 8 e_ N 51 Dme R 228 m ST 8 r8 D 04 99 44 CL 21 00 _C _C Nv _ 0 Ct DH 3 10 3 12 Cte 65 e_21 _ X e_2 6 2 L 87 2 oRR gi_ 5 A Ct _2 2 5 e 8 6 2924 53 Ct _21000 Am 81 te 2924 4 p -2_ C _2 91 h 9 9 6 Am iCdh C Lgi 1 13 _15 9 4 p gi r1 Amp hi r1 L 159 h Cdhr 6 Cd C h Lgi_ i 36 te_ iCd 7 ph 1 m 302 2 01. Lg 19 hr A 2 i 9 6 i_ 314 _2 17 Lg 23497 27 Lg 2 1 i_ 0 oR i 92 P0 R _1 47 Lg X 28 Hsa_ 88 54 h_ 8 8_ 5 s_ 71 da C C 44 1 71 ch te 12 gi_ 378 so _1 D3 L 2 0 us Ct 39 _1_ e_ 52 Lgi_ 3787 EA 13960 2 _2 69 gi 8 Hs W6 L 6 a 86 9 _237 _PC 8 Lgi A 3.1 68 mp DH-1 gi_17182 D hiC L 378 s V m d 6 _2 84 _pc e_ 7 d hr Lgi 36 dh ac 18 2 3 11_ Lg hs i_ 662 Xl i_ ous Lg 16 inke 14 i_ 27 d_ Ct 30 Lg X e_ 94 2388 P_ 135 gi_ 73 002 638 L 714 A 7413 i_1 0 m 9 Lg 768 ph 0.1 23 A iPC gi_ 88 mp DH2 L 78 hiPC 16 Hs D i_ 676 a_PC H Lg Hsa DH- 1 _237 _P_PCD 12 Lgi 4389p _16 Hsa H Lgi 623 _PC -17 175 Hs DH- gi_ 9 a_P 19_a L 303 CDH i_17 Hsa_P -19 Lg 823 Hsa C _c 173 _PC DH- Lgi_ 7p DH 10_1 786 Hsa_ _alph i_23 PCD a-C Lg 5 H H_a 2_1 16587 sa_ lpha- Lgi_ PCDH C1_ 822 H _a 1 _171 sa_P lpha Lgi 1 CDH -5_1 7182 Hs _alpha Lgi_1 a_PC -6_ 92 DH_ 1 2327 Hsa alpha Lgi_ 0 _PCD -8_1 dhr2 H_al phiC Hsa_ pha-1 Am 7 PCDH 1_1 20161 _alph Cte_ Hsa_P a-10_ 3 CDH_ 1 a_CDHR_ alpha- Hs Hsa_P 12_1 64 CDH_ gi_1565 alpha-1 L Hsa_PC _1 6560 DH_alph Lgi_15 Hsa a-2_1 3 _PCDH 180_C _alpha- oRR Hsa_ 13_1 DF PCDH_a oRR277_C2 X lpha-3_1 060_C2D Hsa_PCD 6-8_T H_alph opRR52 a-4_1 K Hsa_PCDH RR836_C5 _alpha- o 7_1 2259 Hsa_PCD Lgi_23 H_alpha-9_2 Hsa_PC Lgi_160343 DH_alpha-9_1 H Cte_199156 sa_PCDH_beta-13 oRR625_C15 Hsa_PCDH_beta-8 Lgi_161952 Hsa_PCDH_beta-15 Hsa_Ret_a Hsa_PCDH_beta-6 Dme_CDH_96Ca Hsa_PCDH_beta-5 oE098_C4MR_IZV Hsa_PCDH_beta-4 oRR092_C2_IZ Hsa_PCDH_beta-3 oRR444_C2Y_IZ Hsa_PCDH_beta-2 opRR671_C3A_IZ Hsa_PCDH_beta-7 oC831_C2X_IZ IV PCDH_beta-10 Hsa_ oC826_C2Y ta-9 _IZ Hsa_PCDH_be oRR894_C2D H_beta-14 Y_IZ Hsa_PCD oT039_ a-11 C3X2_IZ PCDH_bet Hsa_ oC832_C eta-12 3_IZ sa_PCDH_b oE097 H _1 _oT898_C mma-C5 3_IZ PCDH_ga oE093_ Hsa_ C4_1 C6TDS_I amma- op ZUOJV PCDH_g E100_C3 Hsa_ C3_1 DR_UOW _gamma- opC829 V sa_PCDH -B2_1 _C3D H amma opC R_ZUO CDH_g 1 825_C WV Hsa_P a-B1_ 3DR_U gamm opRR OWV DH_ _1 578_C3 Hsa_PC B7 o DR_ gamma- pC828 UOQV PCDH_ 6_1 _C3DRS_ Hsa_ -B oE09 U _gamma _1 6_C OWV PCDH -B5 6A_Z Hsa_ mma oRR UOW H_ga _1 576_C V _PCD -B4 oT 2_Z Hsa mma 1 096_ H_ga A11_ C3_Z a_PCD ma- oRR0 Hs gam 1 26_ DH_ 12_ o C3A _PC ma-A G80 _Z Hsa gam 0_1 8_C3 H_ -A1 oG8 SX_

U Protocadherins _PCD mma 1 10-8 OW Hsa _ga -A9_ o 13_ JQ CDH ma T18 C6T a_P gam 8_2 6_C DNS Hs H_ a-A oJ 3L_ F_Z PCD 054 Z UO sa_ _gamm 8_1 _C6 WJQ H DH a-A oG8 ATN V _PC mm _1 12_C F_Z Hsa _ga -A5 oJ052 6AT UO ma N WJQ PCDH am 2_1 _C SF V sa_ g A oG 6TDNF _ZU H DH_ 1 809_C OW _PC 1_ o _Z JQ sa H_gamma-a-A J0 6TD UO V H CD _1 53_ NF WJ _P mm A7 oG C3 _Z QV sa H_ga a- 815 _Z UO H CD m 6_1 _C WJ P A op 6T QV Hsa_ DH_gam a- RR DN PC amm 4_1 o 544 F_Z a_ _g -A B3 _C UO Hs DH ma 23 33_ 5T W PC am 57 oJ051 C6 DNFX JV 2 7 TN Hsa_ e_2 41 _C F_I _UO CDH_g Ct 22 oF 3D ZU W a_P 9 973 A_Z OW JQ Hs te_2 225 oB _C J V C 18 32 3H QV e_ 26 3_ TS Ct 79 oB C4 X2 324_C S_ _U e_22 o ZV O Ct B3 6D2 W _2225885 26 V te 607 oB _C X C 22 7 32 5 _IZ e_ 78 o 2_C4S _IZ Q Ct 18 B3 V 0 27 X e_2 39 oT _C4 _ Ct 2 6 8 ZV te8 o 40 S_ C 05 B3 _C ZV 176 72 oRR 17 4X _C _I Cte_ 721 0 0 6D Z te 80 oB 75 S C 18 3 31 _C6 X_ _2 6 o 8_ IZ te B C6 _IZ UW C 9 31 D2 U J te333 96 oB31 6_ OW Q C 28 7 C _Z J V 2 3 oB309 0 6S U e_ 96 _C D W t 34 o 4 _I JQ C _21 0 B3 _C _Z ZU V te 6 65 o 1 O C 15 B31 1 4_Z JV e_ 26 3 _C t e9 3 o 2_ 3 C t L482_ _ C 51 5 o C4 Z 19 8 RR _ _ C Z te 326 31 o 8 4_ C 7 B3 8 Cte 2 0 o 3 3_ Z e6 86 K 2 C t 7 7 _C 3 C 4 oB 4 6 _Z e96 8 9_ T 0 o 3 C D Ct K 29_ 6MS SF 3 7 _1445 0 o 5 C _ te i717 7 K 3 6S _ Z C g 1 74 _C Z U L 7 o N W O i K7 7_ 6 _IZU W g a oB C T JV J L 46_ 6 S QV inked_aed_ p 3 S N O -l 2 oRR8 2 C6 F F nk 9 8 _ _ W _X li 0 o _ T Z Z J 1 - 3 F98 C P U U Q 1 2 7 2 6 SF O O V 1 2 o 6_ D W WQV _ i519650 R 5_ C6 T D J DH- -11_Y g 1 o R S _I Q C gi L 82 R 71 C S F_ Z V P H L i 45 4 o R 6T P U W _ D 9 3 F 1 2 _ J a C Lg 0 0 7 9 2 _C M ZU O Q 2 9 2 o 7 6 F WJ Hs _P i_ 3 T6 7 _ 5 _ OW V a 4 5 oF _ C _ I s g 1 2 1 7 C6 6 U Z Q H L i_ 0 9 1 SF O U J V 9 U o 8 _ M W O V Lg gi6 _ F 2 C _ W L 13 2 o 9 _ 6 S IZWJ V _ C7 X C 7 C M F QV _ 7 2 o 8 5 6 X2 gi 6 E _ S 08 C 2 19 C TS FX _ Q L 9 _ o 8 6 UOWJ 0 20 9 T 1 _ SB X _ V R 5 5 3 o 6 6 C6 _ ZU R 8 R 7 _ Z o R4 gi 716 6 2 o C T N U i 3 R 5 6D S X_ O L g 1 C 9 _ O W Q oR 5 0 o 81 5 C E IZ WJ V L 46 9 I3 3_ 6 S_IZU _ J gi 0 8 0 o 7_C _IZU ZUOW U Q L 4 I31 1 C O Q o 6 V V 0 26 p _C 6 W 1 511 5 o 5 6 M WJQ _ R _ M OW J Lgi_1 i 1 R R 2 S J Q g 10 5 V o R31 C _IZ S N QV V L i_ i 5112 oN6 I31 8 2 F F J V g g ZU 8 X_ _ _ Q ZV o 6_C2 W Z Z L L 10 _ Z 8 4_ _ 7_ _ I _ I U U i 7 6 o 3 5 Z O C W C 2 7_C C5D_ C O V o WJ Lg 81 1 3 X WJ 2_C ZU o H854 _ X 46_ 3_C Q 4 _IZ 4 _ oR RR 8 C Q Q 2 6 UW Q D 85 _C 3 IZ V V R IZ J o _ T R V R 7 _C6TSF _C W o R 6 6 I SF_IZ oRR7 R 7 UOW Q 27 S Z oR o R R2 3_ o 2 O UO V F_ 8 C6_ IZ W H R 6_ _ _ Z oR 7 C R O V 8 760 ZU X oH84 7 C 6 U R 87 T WQ o R 4 _ DF BFX_ 9 6 O o 4 6 O o RR571_ 6 C _ OW L WQ WV _ _ H I W 851_ U W o H852_ 82_C6K C6 6 X Z o TP O O _ZU Z C H T _C A o RR 7_ _ U J 4 TF_ZU U U 5 O _ 6 S J Q o _Z U L H T Z O C 6 _ 5 o H F V 55 Z U O C S UO V _ F Z o R 4 447 T X_ZO WQ 8 _C C T E_ 0 FX _ o R 8 6X S R _C X_ T W R B 3 4 _ 5 V oRR1 C6S C6S 4 F S RR R 4 F Z WJ 8 4 X_Z 8 O _ o 6 _ T _I TS 4 C W _ _C6T X V oR G 5 4 V 731_C 83 IZ O 85 6 _ 5_ZU U oR R C G ZU _ C O V oP09 5 W QV oA C S A8 C C4 G G_ _Z I W _ C W o R5 21_ _C 6 UO Z JQ _ o _ _Z _ U Q V oRR727 R5 FX_ oA 1 _ R 75 _ WQ U J 3 6T 3 oA8 5 0 ZU V oR U 4 846 6 ZU S Z Q 1 W _ o R2 6D O 4 4 _Z UO Z W V 8 C Z oRR 1 9 _ C X O OW V _C 84 5 oRR549 R 3 6T 8 4 WQ _Z V C Z W A oA 41 _ o R 9 _C6 6TS _ 3_ V _ Q V 2 A o R1 1 T C _Z T 4 _ I UOWJ 5 o O Q V o RR 7 C 6 S UOW J oD4 o _ZO _C C o M 8 C6T S _ X 4 W V TS Q oT0 M061_C 9 F oD U V 1 6 8 o N _ 839_ F X QV o 1 T F A8 opA C _C o 0 9_C6 _ F_ X2_ V O V o 7 C Z 1 T_ 3 R p X F Q o o 6 7_C A T V J C6T SF_ _ _ 7 o A836 55 o JQ W o _ o pB 2 U o 6 4 o 6 J V o RR 5 Z Z 4_IZW A8 A S o D 9 WJQ R7 Q R p J III IZU V _C Q M0 7 V 9_ 550_ A T_ Z JQ R V W V O R

V 8 _C Q D Q U OWJQ 3 QV _C6T _ I C5 J 8 Q W J T 8 ZOW FX _ R

WJQV 8 J _ 6 Z R 3 Z oA842 o J J U R B TS 4 O Q 37 3 WJ OW W Q 83 1 C6 _ V S F UO 7 _ 6 Z JQ C6 T W 3 73 20 0 JQ UO U 0 5T

_ J RR UOWJ W 6 I 2 O J OWJ SF_ZOW 8 6 X D 8 6 _ C T_I RR83 S U UW UOW Z O 6T F_ 55_ 5 3 5_ o Z UO _Z 6 _C _C O U C _C 1_C U C6 0 _ o o 6 _ O 1 TS Q

W Q Z 0 5_Z W _ 8 KT _ H S U OW 6 1 _ OWJQ I Z C _ 4 _ _ _ WJ W J ZU UO 6 S _ UW ZU Z 6 C I U _C C KDF ZO C6 V OW C _IZ V V A I _ 0 S_ Z O _ O C T I _ Q 5 6 6 6 J C6T C 5 T _ _ Z ZU UO A Z Z Q J o 4 C 6 6 I 1 U _U 6 C T T S Q _ U TS_I 5 Z U V 7 6T _ 2 X_ J C C Q _C X_ 6 U Q 4 T_ _ T OWJ 6 TFX W F _ 6 V 7_ C 2 Z S _Z _ _ U 2 Z U _ZU 6 T_I Z D V Q F_ _ TX_ 3 C 6 _ S V 7 X O V _ T IZU I W D 3 3 T Z

_ _

_ X 6 _ S Z Z O O 4 J V 6T 2 T 4 _ o T S X I 8 5 X_ U U W T I 6_ 6T T _Z O ZUO Q D JQ 1 U 1 S_ W 27 C _C Z Z ZO WJ p _ _ D - 6 A - D J 2 6 O A85 _C 6TSF Q RR0 Z 8 T U V O W O WJ J IZU o _C I o C6 5 Q J 828_C6 9 o 5 C U o Z o 6 3_ 88 _C U O C QV V V

_ O 4 W WJ Q 3 W Q 3 _ J A A8 U 0 0 BF V O

C oA OWJQ 2 9 WJ WJQ 65 2 W 2 2 Q Q V o o 1 V

O _ V 1 O V W

2 20_C 11_ T 8 T A83 oP Q V 1 V

V

W A8 WJ

8 8 p p Q oN A V o 2 6HT A8 p

o A o

o 8 J V o V o

o

oA8 oA Q

Q A

V

o

16_C

8

A

o b Scaffold 30672

100 kb

20 kb Cadherins Scaffold 9600 OL Ova Sub Skin PSG St15 ANC Supra Retina Testes Viscera 20 kb Suckers

–3 –2 –1 0 1 2 3 Row Z-score Figure 2 | Protocadherin expansion in octopus. a, For a larger version of contain the two largest clusters of protocadherins, with 31 and 17, respectively. panel a, see Extended Data Fig. 11. Phylogenetic tree of cadherin genes in Hsa Clustered protocadherins vary greatly in genomic span and are oriented in a (red), Dme (orange), Nematostella vectensis (mustard yellow), Amphimedon head-to-tail manner along each scaffold. c, Expression profiles of 161 queenslandica (yellow), Cte (green), Lgi (teal), Obi (blue), and Saccoglossus protocadherins and 19 cadherins in 12 octopus tissues; 7 protocadherins were kowalevskii (purple). I, Type I classical cadherins; II, calsyntenins; III, octopus not detected in the tissues sampled. Cells are coloured according to number of protocadherin expansion (168 genes); IV, human protocadherin expansion (58 standard deviations from the mean expression level. Protocadherins have high genes); V, dachsous; VI, fat-like; VII, fat; VIII, CELSR; IX, Type II classical expression in neural tissues. Cadherins generally show a similar expression cadherins. Asterisk denotes a novel cadherin with over 80 extracellular pattern, with the exception of a group of sucker-specific cadherins. cadherin domains found in Obi and Cte. b, Scaffold 30672 and Scaffold 9600

a a Scaffold 19852 Figure 3 | C2H2 ZNF expansion in octopus. a, Genomic organization of the largest C2H2 cluster. Scaffold 19852 contains 58 C2H2 genes 100 kb that are transcribed in different directions. bb cc 0.045 b, Expression profile of C2H2 genes along Scaffold C2H2 Zinc finger, tandem C2H2 Zinc finger, non-tandem 19852 in 12 octopus transcriptomes. Neural and 0.25 0.04 c finger, tandems Not clustered developmental transcriptomes show high levels of ger, non-tandem Clustered expression for a majority of these C2H2 genes. 0.035 0.20 In a and b, arrow denotes scaffold orientation. 0.03 c, Distribution of fourfold synonymous site

0.15 transversion distances (4DTv) between C2H2- 0.025 domain-containing genes.

frequency 0.02 Fraction 0.10

0.015

0.05 Scaffold 19852 C2H2 genes 0.01

0.005 0.00

0 0.025 0.175 0.325 0.475 0.625 0.775 0.925 0 0.2 0.4 0.6 0.8 1 OL Ova Sub Skin PSG ANC St15 4DTv4DTv distance Supra Retina Testes Viscera Suckers

–3 –2 –1 0 1 2 3 Row Z-score

222 | NATURE | VOL 524 | 13 AUGUST 2015 G2015 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH squid protocadherin expansions may reflect the notable differences most of which are tandemly arrayed in clusters (Extended Data Fig. 7). between octopuses and decapodiforms in brain organization, which These subunits lack several residues identified as necessary for the have been most clearly demonstrated for the vertical lobe, a key struc- binding of acetylcholine26, so it is unlikely that they function as acetyl- ture in cephalopod learning and memory circuits2,22. Finally, the inde- choline receptors. The high level of expression of these divergent sub- pendent expansions and nervous system enrichment of protocadherins units within the suckers raises the interesting possibility that they act as in coleoid cephalopods and vertebrates offers a striking example of sensory receptors, as do some divergent glutamate receptors in other convergent evolution between these clades at the molecular level. protostomes27. In addition, we identified 74 Aplysia-like and 11 verte- As with the protocadherins, we found multiple clusters of C2H2 brate-like candidate chemoreceptors among the octopus GPCR super- ZNF transcription factor genes (Fig. 3a and Supplementary Note 8.4). family of ,330 genes (Extended Data Fig. 6). The octopus genome contains nearly 1,800 multi-exonic C2H2- We found, amid extensive transcription of octopus transposons, containing genes (Table 1), more than the 200–400 C2H2 ZNFs found that a class of octopus-specific short interspersed nuclear element in other lophotrochozoans and the 500–700 found in eutherian sequences (SINEs) is highly expressed in neural tissues (Supplemen- mammals, in which they form the second-largest gene family23. tary Note 4 and Extended Data Fig. 8). Although the role of active C2H2 ZNF transcription factors contain multiple C2H2 domains that, transposons is unclear, elevated transposon expression in neural in combination, result in highly specific nucleic acid binding. The tissues has been suggested to serve an important function in learning octopus C2H2 ZNFs typically contain 10–20 C2H2 domains but some and memory in mammals and flies28. have as many as 60 (Supplementary Note 8.4). The majority of the Transposable element insertions are often associated with genomic transcripts are expressed in embryonic and nervous tissues (Fig. 3b). rearrangements29 and we found that the transposon-rich octopus gen- This pattern of expression is consistent with roles for C2H2 ZNFs ome displays substantial loss of ancestral bilaterian linkages that are in cell fate determination, early development and transposon silencing, conserved in other species (Supplementary Note 6 and Extended Data as demonstrated in genetic model systems23. Fig. 9). Interestingly, genes that are linked in other bilaterians but not The expansion of the O. bimaculoides C2H2 ZNFs coincides with a in octopus are enriched in neighbouring SINE content. SINE inser- burst of transposable element activity at ,25 Mya (Fig. 3c). The flank- tions around these genes date to the time of tandem C2H2 expansion ing regions of these genes show a significant enrichment in a 70–90 base (Extended Data Fig. 9d), pointing to a crucial period of genome evolu- pair (bp) (31% for C2H2 genes versus 4% for all genes; tion in octopus. Other transposons such as Mariner show no such 216 Fisher’s exact test P value ,1 3 10 ), which parallels the linkage of enrichment, suggesting distinct roles for different classes of transpo- C2H2 gene expansions to b-satellite repeats in humans24.Wealso sons in shaping genome structure (Extended Data Fig. 9c). found an expanded C2H2 ZNF repertoire in amphioxus (Table 1), Transposable element activity has been implicated in the modifica- showing a similar enrichment in satellite-like repeats. These parallels tion of gene regulation across several eukaryotic lineages29. We found suggest a common mode of expansion of a highly dynamic transcrip- that in the nervous system, the degree to which a gene’s expression is tion factor family implicated in lineage-specific innovations. tissue-specific is positively correlated with the transposon load around To investigate further the evolution of gene families implicated in that gene (r2 values ranging from 0.49 in the optic lobe to 0.81 in nervous system development and function, we surveyed genes assoc- the subesophageal brain; Extended Data Fig. 8 and Supplementary iated with axon guidance (Table 1) and neurotransmission (Table 2), Note 4). This correlation may reflect modulation of identifying their homologues in octopus and comparing numbers by transposon-derived enhancers or a greater tolerance for transposon across a diverse set of animal genomes (Supplementary Notes 8–10). insertion near genes with less complex patterns of tissue-specific gene Several patterns emerged from this survey. The gene complements regulation. present in the model organisms D. melanogaster and C. elegans often Using a relaxed molecular clock, we estimate that the octopus and showed striking departures from those seen in lophotrochozoans squid lineages diverged ,270 Mya, emphasizing the deep evolutionary and vertebrates (Table 2 and Supplementary Note 10). For example, history of coleoid cephalopods8,30 (Supplementary Note 7.1 and D. melanogaster encodes one member of the discs large (DLG) family, Extended Data Fig. 10a). Our analyses found hundreds of coleoid- a key component of the postsynaptic scaffold. In contrast, mammals and octopus-specific genes, many of which were expressed in tissues have four DLGs, which (along with other observations) led to sugges- 25 containing novel structures, including the chromatophore-laden skin, tions that vertebrates possess uniquely complex synaptic machinery . the suckers and the nervous system (Extended Data Fig. 10 and However, we found three DLGs in both octopus and limpet, suggesting Supplementary Note 11). Taken together, these novel genes, the that vertebrate and fly gene number differences are not necessarily diagnostic of exceptional vertebrate synaptic complexity (Supplemen- tary Note 10.6). Table 2 | Ion channel subunits Overall, neurotransmission gene family sizes in the octopus were very similar to those seen in other lophotrochozoans (Table 2 and Supplementary Note 10), except for a few strikingly expanded gene Obi Aca Lgi Cte Dme Cel Hsa families such as the sialic acid vesicular transporters (sialins) Voltage-gated calcium channels 8 8 6 10 9 10 10 (Supplementary Note 10.2). We did find variations in the sizes of Voltage-gated sodium channels 3 2 3 2 4 0 13 neurotransmission gene families between human and lophotrochozo- Transient receptor potential channels 36 45 40 43 13 23 29 K1 channels ans (Table 2 and Supplementary Note 10), but no evidence for sys- Voltage-gated 30 23 29 20 10 51 40 tematic expansion of these gene families in vertebrates relative to Calcium-activated, small/large conductance 12 8 9 6 3 6 8 octopus or other lophotrochozoans. Although some gene families were Inward rectifying 3 4 5 6 4 3 16 larger in mammals or absent in lophotrochozoans (for example, Two pore 12 9 12 14 11 47 15 Non-voltage-gated 27 21 26 26 18 72 39 ligand-gated 5-HT receptors), others were absent in mammals and Cys-loop receptors present in invertebrates (for example, anionic glutamate and acetyl- Glutamate 21 15 47 36 30 15 18 choline receptors). The complement of neurotransmission genes Nicotinic acetylcholine 53 16 52 77 10 88 16 in octopus may be broadly typical for a lophotrochozoan, but our Inhibitory acetylcholine 3 2 5 2 0 4 0 5-HT3 0 0 0 0 0 1 5 findings suggest it is also not obviously smaller than is found in mam- GABA 6 5 4 9 3 7 19 mals. Glutamate-gated chloride channels 7 5 8 5 1 6 0

Among the octopus complement of ligand-gated ion channels, we Number of subunits of representative ion channel families in O. bimaculoides and across examined taxa. identified a set of atypical nicotinic acetylcholine receptor-like genes, Dendrogram above species names shows their evolutionary relationships. Aca, Aplysia californica.

13 AUGUST 2015 | VOL 524 | NATURE | 223 G2015 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER expansion of C2H2 ZNFs, genome rearrangements, and extensive 21. Noonan, J. P., Grimwood, J., Schmutz, J., Dickson, M. & Myers, R. M. Gene conversion and the evolution of protocadherin gene cluster diversity. Genome Res. transposable element activity yield a new landscape for both trans- 14, 354–366 (2004). and cis-regulatory elements in the octopus genome, resulting in 22. Shomrat, T. et al. Alternative sites of synaptic plasticity in two homologous ‘‘fan-out changes in an otherwise ‘typical’ lophotrochozoan gene complement fan-in’’ learning and memory networks. Curr. Biol. 21, 1773–1782 (2011). 23. Liu, H., Chang, L. H., Sun, Y., Lu, X. & Stubbs, L. Deep vertebrate roots for that contributed to the evolution of cephalopod neural complexity and mammalian zinc finger transcription factor subfamilies. Genome Biol. Evol. 6, morphological innovations. 510–525 (2014). 24. Eichler, E. E. et al. Complex b-satellite repeat structures and the expansion of the Online Content Methods, along with any additional Extended Data display items zinc finger gene cluster in 19p12. Genome Res. 8, 791–808 (1998). and Source Data, are available in the online version of the paper; references unique 25. Nithianantharajah, J. et al. Synaptic scaffold evolution generated components of to these sections appear only in the online paper. vertebrate cognitive complexity. Nature Neurosci. 16, 16–24 (2013). 26. Brejc, K. et al. Crystal structure of an ACh-binding protein reveals the ligand- Received 26 December 2014; accepted 16 June 2015. binding domain of nicotinic receptors. Nature 411, 269–276 (2001). 27. Croset, V. et al. Ancient protostome origin of chemosensory ionotropic glutamate receptors and the evolution of insect taste and olfaction. PLoS Genet. 6, e1001064 1. Hanlon, R. T. & Messenger, J. B. Cephalopod Behaviour (Cambridge Univ. Press, (2010). 1996). 28. Erwin, J. A., Marchetto, M. C. & Gage, F. H. Mobile DNA elements in the generation of 2. Young, J. Z. The Anatomy of the Nervous System of Octopus vulgaris (Clarendon diversity and complexity in the brain. Nature Rev. Neurosci. 15, 497–506 (2014). Press, 1971). 29. Che´nais, B., Caruso, A., Hiard, S. & Casse, N. The impact of transposable elements 3. Wells, M. J. Octopus: Physiology and Behaviour of an Advanced Invertebrate on eukaryotic genomes: from genome size increase to genetic adaptation to (Chapman and Hall, 1978). stressful environments. Gene 509, 7–15 (2012). 4. Bonnaud, L., Ozouf-Costaz, C. & Boucher-Rodoni, R. A molecular and karyological 30. Strugnell, J., Norman, M., Jackson, J., Drummond, A. J. & Cooper, A. Molecular approach to the taxonomy of Nautilus. C. R. Biol. 327, 133–138 (2004). phylogeny of coleoid cephalopods (Mollusca: Cephalopoda) using a multigene 5. Hallinan, N. M. & Lindberg, D. R. Comparative analysis of chromosome counts approach; the effect of data partitioning on resolving phylogenies in a Bayesian infers three paleopolyploidies in the mollusca. Genome Biol. Evol. 3, 1150–1163 framework. Mol. Phylogenet. Evol. 37, 426–441 (2005). (2011). 6. Yoshida, M. A. et al. Genome structure analysis of molluscs revealed whole genome Supplementary Information is available in the online version of the paper. duplication and lineage specific repeat variation. Gene 483, 63–71 (2011). Acknowledgements We thank C. T. Brown and J. Rosenthal for making Doryteuthis 7. Rosenthal, J. J. & Seeburg, P. H. A-to-I RNA editing: effects on proteins key to RNA-seq data available before publication; C. Ha, J. Orenstein, J. Brandenburger, neural excitability. Neuron 74, 432–439 (2012). M. Glotzer and H. Gui for bioinformatic assistance; S. Shigeno for help with tissue 8. Kro¨ger, B., Vinther, J. & Fuchs, D. Cephalopod origin and evolution. Bioessays 33, dissection; C. Huffard and R. Caldwell for providing the O. bimaculoides specimen used 602–613 (2011). for genomic DNA isolation; and E. Begovic for genomic DNA preparation. This work was 9. Herculano-Houzel, S., Mota, B. & Lent, R. Cellular scaling rules for rodent brains. supported by the Molecular Unit of the Okinawa Institute of Science and Proc. Natl Acad. Sci. USA 103, 12138–12143 (2006). Technology Graduate University (S.B. and D.S.R.) and by funding from the NSF 10. Grasso, F. W. & Basil, J. A. The evolution of flexible behavioral repertoires in (IOS-1354898) and NIH (R03 HD064887) to C.W.R. and from the NSF (DGE-0903637) cephalopod molluscs. Brain Behav. Evol. 74, 231–245 (2009). to Z.Y.W. This work used the Vincent J. Coates Genomics Sequencing Laboratory at UC 11. Holland, P. W., Garcia-Fernandez, J., Williams, N. A. & Sidow, A. Gene duplications Berkeley, supported by NIH S10 instrumentation grants S10RR029668 and and the origins of vertebrate development. Development (Suppl.),125–133 S10RR027303, and the University of Chicago Functional Genomics Facility, supported (1994). by NIH grant UL1 TR000430. 12. Dietrich, F. S. et al. The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome. Science 304, 304–307 (2004). Author Contributions The Chicago and the OIST/Berkeley groups initiated their 13. Kellis, M., Birren, B. W. & Lander, E. S. Proof and evolutionary analysis of ancient transcriptome and genome projects independently. In the subsequent collaboration, genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, 617–624 both groups worked closely on every aspect of the project. Chicago group: C.B.A., Z.Y.W., (2004). J.R.P. and C.W.R.; OIST/Berkeley group: O.S., T.M., E.E.-G., S.B. and D.S.R. 14. Putnam, N. H. et al. The amphioxus genome and the evolution of the chordate Author Information Genome and transcriptome sequence reads have been deposited karyotype. Nature 453, 1064–1071 (2008). in the SRA as BioProjects PRJNA270931 and PRJNA285380. A browser of this 15. Duboule, D. The rise and fall of clusters. Development 134, 2549–2560 genome assembly is available at (http://octopus.metazome.net/). Reprints and (2007). permissions information is available at www.nature.com/reprints. The authors declare 16. Callaerts, P. et al. HOX genes in the sepiolid squid Euprymna scolopes: implications no competing financial interests. Readers are welcome to comment on the online for the evolution of complex body plans. Proc. Natl Acad. Sci. USA 99, 2088–2093 version of the paper. Correspondence and requests for materials should be addressed (2002). to C.W.R. ([email protected]) or D.S.R. ([email protected]). 17. Simakov, O. et al. Insights into bilaterian evolution from three spiralian genomes. Nature 493, 526–531 (2013). This work is licensed under a Creative Commons Attribution- 18. Zipursky, S. L. & Sanes, J. R. Chemoaffinity revisited: Dscams, protocadherins, and NonCommercial-ShareAlike 3.0 Unported licence. The images or other neural circuit assembly. Cell 143, 343–353 (2010). third party material in this article are included in the article’s Creative Commons licence, 19. Chen, W. V. & Maniatis, T. Clustered protocadherins. Development 140, unless indicated otherwise in the credit line; if the material is not included under the 3297–3302 (2013). Creative Commons licence, users will need to obtain permission from the licence holder 20. Brown, C. T., Graveley, B. & Rosenthal, J. J. Loligo pealeii (Squid) Data Dump (http:// to reproduce the material. To view a copy of this licence, visit http://creativecommons. ivory.idyll.org/blog/2014-loligo-transcriptome-data.html) (2014). org/licenses/by-nc-sa/3.0

224|NATURE|VOL524|13AUGUST2015 G2015 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

METHODS expanded in O. bimaculoides, were manually curated and analysed. We searched Data access. Genome and transcriptome sequence reads are deposited in the SRA the octopus genome and transcriptome assemblies using BLASTP and TBLASTN as BioProjects PRJNA270931 and PRJNA285380. The genome assembly and with annotated sequences from human, mouse and D. melanogaster. Bulk analyses annotation are linked to the same BioProject ID. A browser of this genome assem- were also performed using Pfam45 and PANTHER46. We used BLASTP and bly is available at (http://octopus.metazome.net/). TBLASTX to search for specific gene families in deposited genome and transcrip- Genome sequencing and assembly. Genomic DNA from a single male Octopus tome databases for L. gigantea, A. californica, C. gigas, C. teleta, T. castaneum, bimaculoides31 was isolated and sequenced using Illumina technology to 60-fold D. melanogaster, C. elegans, N. vectensis, A. queenslandica, S. kowalevskii, B. floridae, redundant coverage in libraries spanning a range of pairs from ,350 bp to 10 C. intestinalis, D. rerio, M. musculus and H. sapiens. Candidate genes were verified kilobases (kb). These data were assembled with meraculous32 achieving a contig with BLAST47 and Pfam45 analysis. Genes identified in the octopus genome were N50-length of 5.4 kb and a scaffold N50-length of 470 kb. The longest scaffold confirmed and extended using the transcriptomes. Multiple gene models that contains 99 genes and half of all predicted genes are on scaffolds with 8 or more matched the same transcript were combined. The identified sequences from octo- genes (Supplementary Note 1). pus and other bilaterians were aligned with either MUSCLE48, CLUSTALO49, Genome size and heterozygosity. The O. bimaculoides haploid genome size was MacVector 12.6 (MacVector, North Carolina), or Jalview50. Phylogenetic trees estimated to be ,2.7 gigabases (Gb) based on fluorescence (2.66–2.68 Gb) and were constructed with FastTree51 using the Jones–Taylor–Thornton model of k-mer (2.86 Gb) measurements (Supplementary Notes 1 and 2), making it several amino acid evolution, and visualized with FigTree v1.3.1. times larger than other sequenced molluscan and lophotrochozoan genomes17. Synteny. Microsynteny was computed based on metazoan node gene families We observed nucleotide-level heterozygosity within the sequenced genome to be (Supplementary Note 7). We used Nmax 10 (maximum of 10 intervening genes) 0.08%, which may reflect a small effective population size relative to broadcast- and Nmin 3 (minimum of three genes in a syntenic block) according to the spawning marine invertebrates. pipeline described in ref. 17 (Supplementary Note 6). To simplify gene family Transcriptome sequencing. Twelve transcriptomes were sequenced from RNA assignments we limited our analyses to 4,033 gene families shared among human, isolated from ova, testes, viscera, posterior salivary gland (PSG), suckers, skin, amphioxus, Capitella, Helobdella, Octopus, Lottia, Crassostrea, Drosophila and developmental stage 15 (St15)33, retina, optic lobe (OL), supraesophageal brain Nematostella. We required ancestral bilaterian syntenic blocks to have a minimum (Supra), subesophageal brain (Sub), and axial nerve cord (ANC) (Supplementary of one species present in both ingroups, or in one ingroup and one outgroup. To Note 2). RNA was isolated using TRIzol (Invitrogen) and 100-bp paired-end reads examine the effect of fragmented genome assemblies, we simulated shorter assem- (insert size 300 bp) were generated on an Illumina HiSeq2000 sequencing machine. blies by artificially fragmenting genomes to contain on average 5 genes per scaffold De novo transcriptome assembly. Adapters and low-quality reads were removed (Supplementary Note 6). before assembling transcriptomes using the Trinity de novo assembly package In comparison with other bilaterian genomes, we find that the octopus genome (version r2013-02-25 (refs 34, 35)). Assembly statistics are summarized in is substantially rearranged. In looking at microsyntenic linkages of genes with a Supplementary Table 2.2. Following assembly, peptide-coding regions were trans- maximum of 10 intervening genes, we found that octopus conserves only 34 out of lated using TransDecoder in the Trinity package. We compared the de novo 198 ancestral bilaterian microsyntenic blocks; the limpet Lottia and amphioxus assembled RNA-seq output to the genome to evaluate the completeness of the retain more than twice as many such linkages (96 and 140, respectively). This genome assembly. To minimize the number of spuriously assembled transcripts, difference remains significant after accounting for genes missed through orthology only transcripts with ORFs predicted by TransDecoder were mapped onto the assignment as well as simulations of shorter scaffold sizes (Supplementary Note 6; genome with BLASTN. Only 1,130 out of 48,259 transcripts with ORFs (2.34%) Extended Data Fig. 9b). Scans for intra-genomic synteny, and doubly conserved did not have a match in the genome with a minimum identity of 95%. synteny with Lottia, were performed as described in Supplementary Note 6. Annotation of transposable elements. Transposable elements were identified Transposable elements and synteny dynamics. The 5 kb upstream and down- with RepeatScout and RepeatModeler36, and the masking was done with stream regions of genes were surveyed for transposable element (TE) content. For RepeatMasker37, as outlined in Supplementary Note 4.2. The most abundant genes with non-zero TE load, their assignment to either conserved or lost bilater- transposable element is a previously identified octopus-specific SINE38 that ian synteny in octopus was done using the microsynteny calculation described accounts for 4% of the assembled genome. above. The number of genes for each category and TE class were as follows: 484 Annotation of protein-coding genes. Protein-coding genes were annotated by genes for retained synteny and 1,193 genes in lost synteny for all TE classes; 440 combining transcriptome evidence with -based and de novo gene pre- and 1,107, respectively, for SINEs; and 116 and 290, respectively, for Mariner. diction methods (Supplementary Note 4). For homology prediction we used pre- Wilcoxon U-tests for the difference of TE load in linked versus non-linked genes dicted peptide sets of three previously sequenced molluscs (L. gigantea, C. gigas, were conducted in R. and A. californica) along with selected other metazoans. Alternative splice iso- To assess transposon activity we assigned transcriptome reads aligned to forms were identified with PASA39. Annotation statistics are provided in 5,496,558 annotated transposon loci using BEDTools44. Of these, 2,685,265 loci Supplementary Table 4.1.1. Genes known in vertebrates to have many isoforms, showed expression in at least one of the tissues. such as ankyrin, TRAK1 and LRCH1, also show alternative splicing in octopus but RNA editing. RNA-seq reads were mapped to the genome with TopHat52, and at a more limited level. Octopus genes with ten or more alternative splice forms are SAMtools43 was used to identify SNPs between the genomic and the RNA provided in Supplementary Table 4.1.2. sequences. To identify polymorphic positions in the genome, SNPs and indels Calibration of sequence divergence with respect to time. The divergence were predicted using GATK HaplotypeCaller version 3.1-1 in discovery mode between squid and octopus was estimated using r8s40 by fixing cephalopod diver- with a minimum Phred scaled probability score of 30, based on an alignment of gence from bivalves and gastropods to 540 Mya8. Our estimate of 270 Mya for the 350 bp and 500 bp genomic fragment libraries using BWA-MEM version the squid–octopus divergence corresponds to mean neutral substitution rate of dS 0.7.6a. Using BEDTools44, we removed SNPs predicted in both the transcriptome ,2 based on the protein-directed CDS alignments between the species (Supple- and the genome and discarded SNPs that had a Phred score below 40 or mentary Fig. 6.1.2) and a dS estimation using the yn00 program41. Throughout the were outside of predicted genes. SNPs were binned according to the type of manuscript we convert from sequence divergence to time by assuming that dS nucleotide change and the direction of transcription. Candidate edited genes were ,1 corresponds to 135 million years. For example, unlinked octopus protocad- taken as those having SNPs with A-to-G substitutions in the predicted herins appear to have expanded ,135 Mya based on mean pairwise dS ,1, after mRNA transcripts. octopuses diverged from squid. In contrast, clustered octopus protocadherins are Cephalopod-specific genes. Cephalopod novelties were obtained by BLASTP much more similar in sequence (mean pairwise dS ,0.4, or ,55 Mya). and TBLASTN searches against the whole NR database53 and a custom database Quantifying gene expression. Transcriptome reads were mapped to the genome of several mollusc transcriptomes (Supplementary Note 11.1). To ensure that we assembly with TopHat 2.0.11 (ref. 42). A range of 76–90% of reads from the had as close to full-length sequence as possible, we extended proteins predicted different samples mapped to the genome. Mapped reads were sorted and indexed from octopus genomic sequence with our de novo assembled transcriptomes, with SAMtools43. The read counts in each tissue were produced with BEDTools using the longest match to query NR, transcriptome and EST sequences from multicov program44 using the gene model coordinates. The counts were normal- other animals. Gene sequences with transcriptome support but without a match ized by the total transcriptome size of each tissue and by the length of the gene. to non-cephalopod animals at an e-value cutoff of 1 3 1023 were considered for Heat maps showing expression patterns were generated in R using the heatmap.2 further analysis. Octopus sequences with a match of 1 3 1025 or better to a function. sequence from another cephalopod were used to construct gene families, which Gene complement. Gene families of particular interest, including developmental were characterized by their BLAST alignments, HMM, PFAM-A/B, and regulatory genes, neural-related genes, and gene families that appear to be UNIREF90 hits. The cephalopod-specific gene families are listed in the Source

G2015 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

Data file for Extended Data Fig. 10. Octopus-specific novelties were defined as 45. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, sequences with transcriptome support but without any matches to sequences D222–D230 (2014). from any other animals (,1 3 1023), including nautiloid and decapodiform 46. Mi, H., Muruganujan, A. & Thomas, P. D. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. cephalopods. Nucleic Acids Res. 41, D377–D386 (2013). 31. Pickford, G. E. & McConnaughey, B. H. The Octopus bimaculatus problem: a study 47. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein in sibling species. Bulletin of the Bingham Oceanographic Collection 12, 1–66 database search programs. Nucleic Acids Res. 25, 3389–3402 (1997). (1949). 48. Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time 32. Chapman, J. A. et al. Meraculous: de novo genome assembly with short paired-end and space complexity. BMC Bioinformatics 5, 113 (2004). reads. PLoS ONE 6, e23501 (2011). 49. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence 33. Naef, A., Boletzky, S. v. & Roper, C. F. E. Cephalopoda. Embryology (Smithsonian alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011). Institution Libraries, 2000). 50. Waterhouse, A. M., Procter, J. B., Martin, D. M., Clamp, M. & Barton, G. J. Jalview 34. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data Version 2–a multiple sequence alignment editor and analysis workbench. without a reference genome. Nature Biotechnol. 29, 644–652 (2011). Bioinformatics 25, 1189–1191 (2009). 35. Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using 51. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2–approximately maximum- the Trinity platform for reference generation and analysis. Nature Protocols 8, likelihood trees for large alignments. PLoS ONE 5, e9490 (2010). 1494–1512 (2013). 52. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq 36. Smit, A. & Hubley, R. RepeatModeler Open-1.0. (2008–2010). experiments with TopHat and Cufflinks. Nature Protocols 7, 562–578 (2012). 37. Smit, A., Hubley, R. & Green, P. RepeatMasker Open-3.0. (1996–2010). 53. Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequence (RefSeq): a 38. Ohshima, K. & Okada, N. Generality of the tRNA origin of short interspersed curated non-redundant sequence database of genomes, transcripts and proteins. repetitive elements (SINEs). Characterization of three different tRNA-derived Nucleic Acids Res. 33, D501–D504 (2005). in the octopus. J. Mol. Biol. 243, 25–37 (1994). 54. Palavicini, J. P., O’Connell, M. A. & Rosenthal, J. J. An extra double-stranded RNA 39. Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal binding domain confers high activity to a squid RNA editing enzyme. RNA 15, transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003). 1208–1218 (2009). 40. Sanderson, M. J. r8s: inferring absolute rates of molecular evolution and 55. Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic divergence times in the absence of a molecular clock. Bioinformatics 19, 301–302 trees. Bioinformatics 17, 754–755 (2001). (2003). 56. Starnes, T., Broxmeyer, H. E., Robertson, M. J. & Hromas, R. Cutting edge: IL-17D, 41. Yang, Z. & Nielsen, R. Estimating synonymous and nonsynonymous substitution a novel member of the IL-17 family, stimulates cytokine production and inhibits rates under realistic evolutionary models. Mol. Biol. Evol. 17, 32–43 (2000). hemopoiesis. J. Immunol. 169, 642–646 (2002). 42. Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with 57. Cummins, S. F. et al. Candidate chemoreceptor subfamilies differentially RNA-seq. Bioinformatics 25, 1105–1111 (2009). expressed in the chemosensory organs of the mollusc Aplysia. BMC Biol. 7, 28 43. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler (2009). transform. Bioinformatics 25, 1754–1760 (2009). 58. van Nierop, P. et al. Identification of molluscan nicotinic acetylcholine receptor 44. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing (nAChR) subunits involved in formation of cation- and anion-selective nAChRs. genomic features. Bioinformatics 26, 841–842 (2010). J. Neurosci. 25, 10617–10626 (2005).

G2015 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

a b 1 Hsa_ADAT1 Mmu_ADAT1 1 Ocbimv22027735m_ADAT ADAT 0.33 Cin_ADAT 0.9

0.64 Lgi_231337 0.16 Cte_224434 ADAR1 Cte_220233 0.88 Lgi_139693 0.92 Z- Adenosine Ocbimv22010033m_ADAR-like dsRBD 0.88 ADAR-like alpha deaminase 1 Hsa_ADAD1

0.98 Mmu_ADAD1

1 Hsa_ADAD2 Mmu_ADAD2 Cte_171815 ADAR2 0.57 0.91 Cte_228448 Lgi_166687 0.59 Cin_ADAR Adenosine dsRBD dsRBD 0.14 Hsa_ADAR 1 deaminase Mmu_ADAR ADAR1 0.85

0.16 Cte_183692 1

1 Lgi_133731 Ocbimv22018643m_ADAR ADAR-like/ADAD 0.85 Dme_ADAR Cin_ADAR2

1 Cte_176450 Adenosine Hsa_ADARB1 dsRBD dsRBD 1 deaminase 0.58 Mmu_ADARB1 0.98 ADAR2 1 Hsa_ADARB2 0.18 Mmu_ADARB2 Lgi_128560 0.97 Ocbimv22009676m_ADAR2 1

1 Dop_ADAR_B Dop_ADAR_A c d

1500

ADAR1 varia A-C A-G ADAR2 1000 A-T C-A C-G ADAR-like C-T /ADAD G-A G-C

500 G-T

OL OL T-A Sub Ova Skin St15 PSG ANC T-C Number of DNA-RNA differences differences Number of DNA-RNA Supra Testes Testes Retina

Viscera Viscera T-G Suckers

3 2 10123 0 Row Z-score Ova Testes Viscera PSG Suckers Skin St15 Retina OL Supra Sub ANC DNA RNAdiff

Extended Data Figure 1 | RNA editing in octopus. a, Approximate bimaculoides show prominent A-to-G changes. Histogram illustrates the maximum likelihood tree of adenosine deaminases acting on RNA (ADARs) in number of DNA–RNA differences detected between coding sequences in bilaterians. ADAR1, ADAR2, ADAR-like/ADAD, and ADAT (tRNA-specific the genome and 12 O. bimaculoides transcriptomes after filtering out adenosine deaminase) were identified in Hsa, Mmu, Cin, Dme, Cte, Lgi, polymorphisms identified in genomic sequencing. Differences were binned D. opalescens (Dop54), and Obi with Shimodaira–Hasegawa-like support by the type of change (see key) in the direction of transcription. A-to-G indicated at the nodes. b, O. bimaculoides ADAR1, ADAR2 and ADAR-like changes are the most prevalent, particularly in neural tissues and during proteins contain one or two double-stranded RNA binding domains (dsRBD) development, paralleling the expression of octopus ADARsinc. Other types as well as an adenosine deaminase domain. ADAR1 also has a z-alpha of changes were also detected at lower levels, possibly resulting from domain. c, Expression profiles of the three ADAR genes found in 12 uncharacterized polymorphisms. O. bimaculoides tissues by RNA-seq profiling. d, DNA–RNA differences in O.

G2015 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

H. sapiens 107 kb

199 kb

117 kb

98 kb

B. floridae 448 kb Hox1 Hox2 Hox3 Hox4 Hox5 Hox6 Hox7 Hox8 Hox9 Hox10 Hox11 Hox12 Hox13 Hox14 Hox15

D. melanogaster 392 kb // 320 kb Lab Pb ZenZenZen Dfd Scr Ftz Antp

Ubx Abd-A Abd-B

C. teleta 243 kb // 22 kb Lab Pb Hox3 Dfd Scr Lox5 Antp Lox4 / / Lox2 Post2 Post1

L. gigantea 471 kb

Lab Pb Hox3 Dfd Scr Lox5 Antp Lox4 Lox2 Post2 Post1

O. bimaculoides 421 kb Lab 474 kb

Scr 751 kb

Lox5 53 kb Antp 137 kb Lox4 437 kb Lox2 231 kb Post2 187 kb Post1

Extended Data Figure 2 | Local arrangement of Hox gene complement in on three scaffolds17. L. gigantea has a single cluster with the full known O. bimaculoides and selected bilaterians. At the top, the four compact Hox lophotrochozoan gene complement. In O. bimaculoides many of the scaffolds clusters of H. sapiens and the single B. floridae cluster are depicted. The are several hundred kb long, and no two Hox genes are on the same scaffold. D. melanogaster Hox complex is split into two clusters. We included genes in The positions of O. bimaculoides genes approximate their locations on the D. melanogaster locus that are homologues of Hox genes but have lost their scaffolds. Dashed lines indicate that the scaffold continues beyond what is homeotic function, such as fushi tarazu (ftz), bicoid, zen and zen2 (the latter shown. Scaffold length is depicted to scale with size noted on the left. Genes are three are represented as overlapping boxes). Hox genes in C. teleta are found positioned to illustrate orthology, which is also highlighted by colour.

G2015 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

Extended Data Figure 3 | Gene complement and gene architecture Spu, S. purpuratus; Hma, Hydra magnipapillata;Adi,Acropora digitifera. evolution in metazoans. a, Principal component analysis of gene family For methods, see Supplementary Note 7.4. b–d, MrBayes55 tree (constrained counts. O. bimaculoides highlighted in green. Deuterostomes are indicated in topology) on binary characters of presence or absence of Pfam domain blue, ecdysozoans in red, lophotrochozoans in green, and sponges and architectures (b), introns (c), or indels (d); scale bar represents estimated cnidarians in orange. Xtr, Xenopus tropicalis;Gga,Gallus gallus; Tca, Tribolium changes per site. For methods, see Supplementary Note 7.3. castaneum;Dpu,Daphnia pulex; Isc, Ixodes scapularis;Ava,Adineta vaga;

G2015 Macmillan Publishers Limited. All rights reserved xeddDt iue4 Figure Data Extended eta ri,O n N,wieteohr hwmr ie distributions. mixed bars. more show others the while b ANC, and throughout OL expressed brain, highly central are protocadherins transcriptomes. octopus the 12 of three-quarters in Over 30672 Scaffold a on located expression. genes of protocadherin sites and sequence in similar hlgntcte ihihigSafl 07 rtcdeisi grey in protocadherins 30672 Scaffold highlighting tree Phylogenetic , RESEARCH

H H H

H sa

H sa s_

H

sa H

Hs Hs

H H s H

b a

H H

H s Hs

s s

_ p H

Hs a H H

s

Hs _ H H H

H

sa_PC a _ s

H H s a s a

H c

sa_P a P a

H H a

_ H H H

H H s

P s

H

a sa

_ s

a

P H s s

a s

_ d _P a

_

sa _ s Hsa

P _ C a

a_P s a

Hs

a s _ s

H s

s H C s a

P s a_ a s _

_ s a

h

_ C P

a sa H P

H PC H a_P P

_ _

_

C a_PC a_PC a_PC a_P P

_ a_PC

a

D a

a _ PC

s _ _ P 1

H C D s

_PC PC

_

C

D C

sa CD PC P

s

P

s P CDH_

_ a

_ H P P a _ C P D P a H_ 1

C _

s C D H P a

H a _

DH D

D H C

C _

PC PC

_ C C PC _PC

_

CDH sa C

C C

a_ D H PC DH

Hs _ D _

_ D C P

H D

D _ C

s P P

H C X D

DH_ H_ _ D

H DH_ H_ D

PC

D g D

P H_gam _ D P

a_ D g CDH_

H D H

H_ _ H _ C H CDH g D

P l

D D D _al D _ D H_

D H s D

a a D

H g i

C CD H

a D H

H P H g H _ n

a _

_ H

s _ga D H a a

CDH_gam g H

D

_ m a

a H

H

H H H _

P a

H

mma H

H D

a

_

a _al

D ke

_

H _ C m a s a g Hs

_g _

_P al

PCDH_ a lp l

_ H_al

m s

al a

ga l Hs _bet H

C g ph

_bet l

p _bet _bet

_bet b

a_P m m _bet

_ _

H a

p _ b

a _ lp

H b H_

p D H

mma _

m b a l

a p h

_ d

h

e

b _ P b lph

m pha D p h

mm e b

am e

mm a

h

CD _

b pha

m a _ _ a e h

H m s

m h

a a- a

g _ a-9

eta t a

a e h

a

C e t

m t _ a

H_ga a

e P a_

g - a a lp

a t a

c

a

a

- a _P -9 X

_ a a H

C lp

- t C5 a a -7 m p t

m PC

a l

D C -4 1

a t l

a- a- a a-

a- - a- - -

a C

ma-A -1 ph

C a

a- -

H g mm -1 ph

- h

a P_

a s

-

A -

- h

1

D a-B -B B 3

- P 1

h

- _

m 3_

B -

a-A _

-2

H a 4 - - 2_

10

a 1 _ A

D

CD

a-

a

3 1 - _

_ -A 1_

a

4 5 11 6

12 3 _ 5

8

g B6_ m 7

3 a

2 _ 2 1 H _ 0

D 9 2

-B 1 _ a-C

mma 4 1

_ a

m _

1 1

gam _ m

-C1

7 0 H

a 1

_1 - _

1 P

m 1

_ m 8 p _ 1 1 a _

g -

H- 1 P

5_ 1 6 1

_ 0

xrsinpoieo h 7pooahrngnslctdon located genes protocadherin 17 the of profile Expression , m a _1 H-19 1 12 - 5 4 p

_

gamma- C

1 hi

a ma- - 1 1

1 _1 a- 1 C 1 2

-A A C

hi _ _

2_

ma 1 1

m 1

_ DH

C 0_ 1 _ 0 7 C

P

m D _ 1 1

- A 9 t 9_c

C 1 PCD 1 _1 e_

t ma A 8 4 C

1 t _1 1

C e_ a- 8_ H

A 1 e_

t - _ 1 _

-17 Sca old 30672

5

C e A D

te - 3

2_ 2 13

A7 _ a

C _ 2 - 1 1 1 H

t 1

A 1 9 e 2

C _ 2 25 _ H

t A 3

0 1 9

e _ 6 1

C 1 2 _ 1

t 2 9 . 4_ 609

e_2 _ 2279 _ 1

te 82 2 7 1 2 1 5

Ct 4 2 Ct _ 2 1 2

25 1 3 2 2607 2 2 Ct e_1 7 C e8 1 5 2 9 8 e_ t 8 6 C 2 78 C e 76 8 C t 2 72 3 5 e t 05 9 7 C t _ e 1 e 88 17 0 t _ 2 3 e 2 3 6 H C _ 2 0 2 C 1 8 3 s t 1 6 0 H a_ e t 5 96 9 _ e9 69 3 Ova s C 6 a 19 3 PC t 26650 _ C e 3 7 P C C t 3 5 4 C D e 1 H t t 6 2 D e e9 6 3 -1 _ 2 3 H L 1 7 8 -1 1 686 5 _ L g 4 31 1 X-l g i7 4 L _ i 1 5 0 Y 7 7 4 g -li ink 1 7 i_ 7 08 1 n e 03 L 2 ke d gi5193 d _ L Lgi 0 _ a g 9 a L i_ 8 2p gi_ 2 202 0 6 1 9 5 o L L 39 4 R g g 5 7 o R i_ i6 0 1 R 9 1 2 3 R 0 3 4 4 Testes 4 8 9 2 5 _ 0 7 0 C 15 L _ 7 g C _ LETTER L i5 7 U g 2 X L Lg i7 0 2 g 1 6 i i5 6 2 L _ 16 9 gi 10 2 Lgi _ 4 8 1 6 3 _10 04 32 o 8 R Lgi_10Lg 5 9 R74 i5 11 0 o 1 0 RR 6_ 52 24 C7_ 5 6 o oRR 11 R 2_ 2 oR R C Z R oL 8 853_C6_Z7 UV oA 855 4 27_ _Z 87 V 83 _C _ C6 Viscera oA 0_C C6 _Z 6T _ 854_C 4T X_IZUOIZUWU P W oD BF o 44 6TF X V A8 3 _ _O W 5 _C6 Z WJQWJ Q 2_ T oA UOWQVOWQ C6 S 85 Q o TTSGS FX 1_Z A8 G _Z V 41 FX_X_Z UO U _C Z WQVO 4ST UUOO o oA F W A8A8 84 _U QV 43_C 4_C5OWVOW V 55XX _ o _ ZU | A Z o 84 UO A8 8_ W oD41 46 C5_ _C5 Z PSG oA8 1 rtcdei ee ihnagnmccutrare cluster genomic a within genes Protocadherin _C5_ _ 39 ZU _C ZU op 5_Z5_Z O o oA A840 UO A8 842_ _ W oD440_55_ C C4 C5 4_ZZUO_V C6T TFX_ UO SDF ZUO W oA X WV 853_C6KT53_C6 oD4 _ZOW 38 QV KTS _C FX_ 5_Z oD ZUO U 439_ WQ C4 V oRR T_ZV 550_ oD4 oRR C3_ 37_C6 837_ Z o TSF_ C4_ Suckers RR0 Z 27_C6 ZOW oA828_C6A SF QV _ZUOW oA827_C6SS_ZUOW V JQ ooAA826_C _ZUO V 6T WJQV oA833 _Z_ZUO _C6TX_ WJQVJQV oP0 IZUOW 88_C6T_ JQV oN IZUOWJ 659_C6 QV T_IZU oA832_C6_OWJQV oA ZU 810_C6TF OWOW _ZU_ZUOWJ opT761_ QV opA8 C3_OWJ Skin 24 _C5TF_U oA82 OWJQV 0_C6TS _ZUOWJQV oA819_C6 SX_IZUOWJQJQ oA811 _C6TSF_ZUWJQ

50.0

o _ _

o R St15

oM065

_C6T

FX_IZUO

W

J Q

o

RR730_C6

TX

_ Z UO W J Q V

o

pR

R

1

6

1_

C

5

_

I

Z

o U

T O

0 W

7

8_ JQ

C

5

_

Z U

o N

6

58_ Retina

C

6HT

S o

_ M

I

ZUWJ 06

1_C

Q

6

V

T

K

o

D M

F

xrsinpoieo h 31 the of profile Expression , 0

_IZ

5

5_C

UO

6 W

T_

J oRR1

Q I

ZU

V

O

6

W

9_C6

J

Q

oR V

T

S R _

549

ZU

OWJ

_C

o 5TS

QV R

R1

_I

72_ ZU

O

C o

W

6

R

J T_IZ

R1 Q

V

1

G OW

7

_

C6

oRR

JQ

T

V F_Z

7

99_C6

UOW

o

R

R B

JQV

X_IZ

7

27

_

o UOW

C

RR2 6

T_

JQ

18_

IU

V

W o

C P

JQ

6 0

T

2015 93 V

_ZU

_

C o

R OWJ

6

T_I R519

Z

Q

U

V

o OL _ OWJ RR

C6

T 5

Q 69_C6TS S

_ V o

Z R

U

R

O

1

WJ

7

5 oRR5

F_ZUO

_ Q

C

V

6T

2

S o 1 WJ

_ZU

RR7 _

C6TSF_

QV

W

3

o

J 1

RR Q

_

C V

Z

6TSFX2 68

OW

o

3 L4

_

J

C6 Q

84_

V

_ o D

IZO

RR T C6_Z

S

F 4 WJ

X 4

o

7 H85 4

Q U

_

_

V C6

I

Z

2_C6 W o

T RR5

Q

SX_IUO

V

S

oH

7 G

1

FX

_ 84

C6 W

oRR _ZUOWJ

7

J

_C SG Q

V

6 682_C _

o

X

ZOQ H8

_IZ

V

4

o

U

9

6 R Supra

_ OW

KT

amla ulsesLmtd l ihsreserved rights All Limited. Publishers Macmillan R C6

7

oRR G_Z

V 6 HTB_

0

_

U 277

C

o

R

O 6

I H

Z R

W

_C

U

T

2 o

S

7 W RR7 6H

F

6

Q

_

X

o

T

C

_

H

S 6

I

6

Z

3

E

8

H

U

_

_Z 5 o

TSF C O

C 4_C6T

O

6 W

8

D

W

oI 18_ JQ

X_

F

J

3

X_

QV

V

Z

21 C

S o

OWJQV

Z

6

N65

F

_ U

SF_

_

C

OWJQV

IZ o

3

I

7

UO 3

Z _

_

1

U I o

C4DT

Z

8

W RR31

O

_

W Q C5 o

V

pR

J

S V

D_

4 o

F_

R8

_C3

I

3

I

I

Z

ZUO

1 o

8

5

I

6 X

3

_

_

_I 1 oC

C

C W

6

Z

2 Sub

2

_

J

81

o X

X Q C2

R

_

V 7_

I R o _ Z

9 T C6 I

ZW

53

675

o

M

E8

_

SF

C o

_

1

C

6 C6

6_ _Z

M

819 oF

_ S C6D

UO

9 IZUOW

o N

_C

7

F

F

5 W

9

o _Z

S

_

6T

8

T

J

_ C

2

Q

UOW

6

o

I

S

6 Z _C

71

V F9 J

S E

U

Q oR

B _ W 6 _C

77

Z

J

N

T

o

R

J

QV

UO

SX_

X

_C6MSFX2_ 6M R Q

12

_ o

V

R

I WJ

F

Z

S

6_C

7

Z

o

9

U

F

1

U R

8 Q

O X

2

oB

O

5 R8

V

6SF

_ _Z

WJ

_

W

C

o

3

C

2

U

K7

5 JQ

28_ QV

6

6 o

_IZ

O

_

_C

T K

UO

U

4

V

W

o

M

7

6_ ANC W C O

K

6

J

4

o F

W

6

W Q S

J 7

7

C

B

_

D

QV

P 5

V _

o JQ

V

I

6

32

T

Z 3

C _ K

T

oB

S

_ Z UOW

V

7 6

P 9_

C U

F

S o 49

SF

332_C

_U

O 6

R

F

C

oL48

T

_

WJ

_

R8

D 6S

QV O

C S o Z

_

B U WJQ

N

6

V

o

N IZ

8

M

O

3 F

B

3

2 6T _I oB309

WJQ

1 _

W S _

3

_

ZUOW o

Z

2

C3 V _Z

D 11

C4_

J

B310_ U _

o

Q

S

C4

B O

V WJ

o _C _

F_Z V

3

B W

Z

o Z

_C 0855m 0854m 0853m 0852m 0851m 0848m 0846m 0844m 0843m 0842m 0841m 0840m 0839m 0838m 0837m 0836m 0835m 0833m 0832m 0830m 0828m 0827m 0826m 0824m 0822m 0821m 0820m 0819m 0816m 0811m 0810m

_

1

3 RR0 V 3

J

oB

Q

Z 6

U 1 Q

_

o 4 C

V _

8

Z O

V

317 T _Z o

C

4

_C6 3

W

7 B 8 o _Z

6

5

4 B oB 3 S

J

_

_

0 Q o

2

32

D

D C

C6_

_ o

B 7

V

3

_

2

6

o

C

B _

2

3 26

IZU

_

DSX_

oJ0

F

C4

_C

3

4 2

Z

o I

9

2 Z X_IZ _ 4

U o

B

O 7

3

UO S

C _ 4

o

p

W 51_C

33

3

_ oJ S JV C

_

G

5_ RR5

o

_C I

C4

Z

J X

ZU

6

W o

8

3

G8

05

QV

V o

_ I D

J

1

_C

Z

o

J

3

S G Z

0

W

oT 5

2

3 J 4

3

HT

oG 0

V _

5

_

X 81

o 0

6T 4 D

_

JQ

9

Z

o

2 2 C

G

5 1 _

_ o

C

_ A

_

S V

R o

2_C

8

I 4 C5

6

8 oE

NF_

T

V

C

80

Z 3

op C6TDN _Z

R

X Row Z-Score

10-813 R0 o

_

TD

6

o 0

_

Q 6T

o

2

C

R p

o

_C

T

096_C 8 o

p 9

Z

C

o

pC

_U

o V

R

6 p

o

6

DNF o 26 5

_ N

I E09

C

o 6

o

D

E o

o

Z o

oE098_C4MR_IZV 8

C

A

E

T A

C

7 R R

C 3L _

F_ C

8 p

R

U R 28

N

0

O

8

T

_C3A_ 8 1 T 6 0

3

C

5 R

2

R

8

8 9

O

29 R F

N R

F_ W

0 _

3 N

3

SX

Z

3 _ _C

7

X_

5

_C

3 2

8

3

7

R

_

W C 0

2 SF

9

4

_C 0

U Z

F 8 6A

_

V

_

6

9

1 _ Z _

Z

_

_

6 4

_

9

2

_ _C

_ UO O

C 6

J

Z

_

4

_ C o U 3DR

U

C3

C 4 7 C3

Z

_Z 2

_

U

QV

6

T _Z

C _

WJ

3 C

T

O 3

O _ 1 U

Z _

Z

T

3

DN 3 O

C2 D

W

8

2

D

2

C

U _

C DR_

W

WJQ

O D

_

UO D X

0123 2 1 10

WJ

Y 9 R_

V X

C

S

O R

J 2

I 2

S W 2

R

J

D

8 Z S _

QV _

Y

_ 3 _

_ W

Q

_

_

_

_

W

F I

Y_ UO Q J I

UO

A

I

_ Z

Z

UOW

I I Z

V Z U C

V

Q

JQ _ZUO

Z

Z

I

U _

V

Z

OQ

3 U

I V

I

W Z OW

W

Z

V _

O

IZ

V

V

J

V V

V W

V

J

Q V

H H Hs

H sa s

cfodtn ocutrtgte ntete.Odro h ee nteha maps heat the in genes the of Order ( tree. the on together cluster in to seen tend scaffold As bars. grey in protocadherins 9600 sample. most St15 is the which in Ocbimv220039316m, expressed highly of in exception expressed the highly with most tissues, are protocadherins nervous these of all Almost 9600. Scaffold H s

H H

H Hs

H _

H sa a H

Hs H

H s a H

s s

_ p H

H s H Hs

s

Hs _ Hs

Hs Hs

Hs

s a _P sa

H H s a a

c

H s

s a P a

H H a_

_ Hs Hsa H

Hs H P s

H

s a _ s

a Hsa a

_ d a_P

a a _ _

s _ Hs

P s

a

C s a

a a_PC

H

a _ s _P

H s

H C sa a

P a a s _

_ s a

P h CDH_gam PC

a _ a H P

H P H a P _

_ _

_ PC C a_P a a

a

D a_PC

_

a_PC _PC s

P _ _ s 1

Hsa P C D sa

_ PC _

C

a

C sa _ CDH_

PC P

PC s s _PC CD

_

PCDH_ _ _

H _ a P a C D P P

H 1

P C _ C

PC d c

D H CD

a_

H a P _

D D

PC C

D P P

_ P _ _

sa_PC C _

CDH_ CDH_ D H P D

H C _ _ D

_ DH_ H P D _ CD

s P P

CDH_ PC H

H

CDH_bet Xlin CD D

H DH H C D

_ D ,

H

P D

C DH_

H g P

s P _ H

a_ D g C

D H

H_al

H PCD C H _ C

D D _ _ H

_ H_a

DH s D

a a D

g H C

_al C

CD _ a DH_ H_

H _a

H g D

H _

sa_PC _ D H a a a

D g H D

_ m a

a H H

P _ H_ _

m H

g H D

_bet

a D k

a

sa g H

_ _g g H_

l

PC a l

lp

_ H

H m s

g a Hs

_ H

p

CD g a p l

_b _b e

mma m al

_

_bet H a

pha b

a _bet lp

P H a m b H ph D m H s

m b a_P

l

a lph

a b pha-

_ h m h

_ d_

H_g h _P b

b

b _a

l

m p

m e

a e m a

CDH _

b p

_

a p _ a e h c

H m m sa a e a a a

D g ma a

e et a

e e

e a- h

t

m _PC

ma t a

H_g a

e ha

a

- h g l a-1

ma a ta a-

m a l

-C t a _ - X -

_ a Hsa

ph -9 t ta l

C H ta t a C a -7_

ma m p

a l

-4_1

p

D _ 9 1 t lp

a a

- a-1 a a-

a C a-8

ma C3_ a-7 - p

ga m - - P

a P_

- A -

a - h

D _g a - B2 3 - 1 P

12_

ha

5 - _ m 1 3

B - - -1

-

H -2 _

a 4 h 2 - 1 10

A

h

D C

B 1 - a

_ - 1_

mm a

5

- 4 6 3 _ 5

m B6 m

_ 3 1 _ H _

a 0

D

- 9 2

A1 - a 1 _ _

mm B5 4 1

_gam a

-

ma 1 _PC

g m

- 00

7 1 H

olw h reigo h orsodn scaffold. corresponding the on ordering the follows ) -A DH

a A1 B4 2 1

_ - 1 P

m 1 8

-

_ C p

_ 1 1 a _

- H-

a 6

_ 1 1 _

m C

1 1 - 5 phi

_ _

ga C

1 h

m -A 1 1

1 m 10 _ _

a 1

1 274 1

-A C

2 _ _

2

1 1

m a- - i

19

_ D

_ C 1 1 -A8 0

Ct

P

1

m a- D _ 1 1 _1 m 9 te_13960

C 1 P _1 1 _

m a 8 H- t A5_ 9

_ 1 C

H-

C e a _ m A e 1 CDH

t - _2 1390

1 _a a- D

C e _225 - A1 _1 c 1 t _

a- A7_ 2_

C e _22 1 7 H t 13

A6_ 1 Ct e_ _ 2

t A _ 1

e 1

C 1 1 2 95 4 .

e_ _ 2 1 t 8 2417 7 1 e 2225 27 _ 1

C 2 2

1 2 C _ 2260 9

2 3 t 2 2 C e_17 t 9 5 Sca old 9600 C e 1 26 9 t 823 8 e t 8 C 7 C _ e 75 8 C te 2 7 605 8 te t 1 21 90 7 C e _ te 2 3 8 _2 3 8 72 6 C 2 H C _ 8 3 00 t 15 1 H sa e t 9 9 6 e 6 3 s _ C _ 92 6 6 a P 1 0 3 9 _ Cte t 9 C e 66 3 7 PC D C Ct 32 5 4 t 1 H e_ 6 3 5 D e 2 6 H -1 96 8 3 1 L 14454 7 -1 g 3 5 _ L i 8 1 Ova 1_ X g 7 6 L -l i 1 0 Y 7 7 g -l in 1 7 i_ i ke 7 0 1 n 03 8 L 2 k d gi 3 ed _ Lgi Lgi82 0 a 51 9 _a L g _209451 96 2p i_ 1 027 5 oRR90L L 3 g g 9 o i_ i6 0 R 1 2 3 R 3 4 4 4 8 9 27 5 _C7 0 0 1 L _ g C _ 5 L i5 7 U g 2 X L i71 0 2 Lg g 6 i i5 6 2 Testes Lgi _ 1 9 1 6 2 L _104890 8 g 4 3 i_1 6 32 oR L L 05 R g gi 1 0 746_Ci_ 5 1 oRR 1 15 0 05 2 o 2 1 6 oRR R 42 7_Z 12 oR R _C7 U R oL 827_C853 V o 85 48 _ A830_ 5 7 _C ZV _C _C6 6_Z 6 oA8 6 _Z 5 C4T TX _I U 4 _IZ ZU W o _C6 PB W D4 F UO V oA852 43 TF_ X_O WQ _C ZU W _C6TS6T oA O J Viscera SF 8 WQ Q o X_ 51 A8 G Z _Z V 41 FX U UO _C _ZU OW 4S O QV o o TF WQV A843_C5A8 _U 44 O _C WV X 5_ oA8 _Z ZU oA 4 UO 846 8_ W oD C5 41 _C _Z oA 1_C 5_ 83 ZU 9_C 5_Z op 5_ZU UO oA A8 O oA8 84 40_ W oD 55_ 2_C C4 440 C5T 4_ _V PSG _C F ZU 6TS X_Z OW oA8 DF UO 5 X_Z WV 3_C oD OW 6KT 438 QV SF _C5 X_Z _ZU oD4 UOW 39_C QV oRR5 4T_Z 50 V oD oRR _C3 437 837 _Z _C6TS _C4 oRR0 F_ _Z 27_C6S ZOW oA82 F QV 8_C6 _ZUO oA AS_ZUOW WV 827_C o 6S_ZU JQV A826_ OW Suckers C6T JQV oA833 _ZUO _C6TX WJQV oP0 _IZUO 88_C6 WJQV o T_IZUO N659_C6 WJQV T_IZUOW oA JQV 832_C6 oA810 _ZUOW _C6TF_Z UOWJQV opT761 o _C3_OW pA824_C5TF_U J OWJQV oA820_C6TS _ZUOWJQV oA819_C6 SX_IZUOWJQ oA81 1_C6TSF_ZUWJ Skin QV o d p T235-14_C5_U0JQ oA8 16_C6HTBFDX_ZUOWJQV opT235-12_C5_U0JQ

hlgntcte ihihigScaffold highlighting tree Phylogenetic , oA821_C6TD_ZUOWJQV

oA822_C6T_ZUOWJQV o R R 8 0 4 _ C 6 _ I Z U O W V

o A 8 3 5 _ C 6 _ Z U O

o

R

R

2 6 1 _ C 6 T X _ I Z U O W J Q

o

M0

6

5

_ C

6 T F X _ IZ U O

W

J

Q

o R

R

7 3 0 _

C

6TX

_ ZUO

W

J Q

V

o

p

T

7

6

1

_ C 2 D

TS

_

OWJ

Q

V

o

A St15

8 3

7

_

C

6 TS

X

_

ZU

OW

V

o o

A8 A8

3 3

8 8

_

C C

6 6

T T

S_ S_

I I Z Z O O

W

V

opB op

B

320_C6 320_C6 o

pB

3

2

0_

C6

T T

T

S S S _

_Z _

ZO

Z

W O O

V W W

V V o o

A A

8 8

3 3

6 6

_ _

C6 C6

T T

X X

_ _

Z Z

O O

W W

o J J

R Q Q

R

V

7 8

3

_

C

6

T

_

ZU

o JQ

p

R

V

R1

6

1

_

C

5

_

I

Z

o U

T O

0 WJ

78

Q

_

C

5

_Z

U o

N 65

8_C

6HT

S_I o

M

ZU

0

6

1 W

_

J

C

b QV 6

TK

oM0

D

F_

5

I Z 5

U _

C6T_ O

W

JQ

o

R

I

Z

V

R

UO

1

6

W 9

_ J

C

Q

6T

o V

R

S

R

_Z Retina

5

4

U

9

rtcdeiso h same the of protocadherins , OWJQV _

C

oRR17 5

T

S_

I

Z

2_C6T_ U

OW

o

RR

JQ

1 V I ZO

1

7

W _

C6T o

J

RR

Q

V

F

7

_

9 Z

9

U

_

O o C

R W

6

R7 B

JQ

X

2 _

V

I 7

ZU

_C

o

R O

6

R WJ

T

2

_I

1

Q

U

8

V

WJ _ oP

C

6 0 Q

T

93_ V

_

ZU

C6

o

R O

T R W

_

5

J I

1 ZU

Q

9

V o

_

O

R

C

WJ

R

6

T 5

Q 69_ S

_ V o

Z

R

C UO

R1

6

T

W 7

SF

5 o

J

_ R Q

_

C

Z R

V

6

U

5

T

O

2

S o 1

W

_

R _

Z

J C

R7

QV U

6T

W

3

o SF_ZO J 1 OL

R Q _

C R V

6 6

T 8

oL

W

S 3_

F

J

4

C X Q

84

2 6 V

_

o D _C

I

R

Z T

R O S

6

F 4 W

_

X 4

o

Z JQ

7_ H85 4

U

_

V

C

IZ

6

2 W o

T _ RR

Q SX_I C6S

V

5

o 71

G

U

H8

O

F

_C6

X

W 4

o _

7_ J

Z R

S

Q

UO

C

R6

G

V

6

_

o

W

X 8

Z

H

2

O _I

J

8 _C6

V

Q

ZUO

4

o

9

RR

_

K

C

W

T

7

o 6

G_

V 6 R H

0

R T

Z

_

B

2 U

C

o

_ 7 R

OW 6

7 I H

ZU R

_

T

27 oRR C

SF

W

6

6

HT

Q

_ X

oH

7

C

_

SE_ 63

I

6

Z Supra

8

HT

U

_

o 54

C O ZO

C

S

6 W

_

81

F

D

C6T

W

o

J

X

FX Q

8

I

J _

3

_C

Q V

Z

2

_ S o

O V

1 ZU

6SF_ N F

W

_

_IZ

6

C

JQ O

oI

5

3

W

7

UO

318_C

V Z _

_

J

UOW I o

C Q Z

W R

4D V

R

QV

o

TSF 3

p

J

5

14_C V

R

D

o

R886

_I

I

_

3

I

Z

Z 1

o

3X_

U 5_ I3

OW

_

16 o

C

C2X

I

C8

Z

2

_C2_

JQ

o

X

1

R

_

V 7_C6M

IZ R oT

9 IZW

5 6

oE

3 7

_

5_

81 S

C6 o

F_ZUOW C8

C

6_

MS

oF

6

1

_I

C Sub

9

9

o

NF

6D Z _

75

F

UO

C

9

o _

S

_ 6

8 ZU T67 J

_I

C

TSE_

WJ 2

Q

o

6

Z _

O V

F

S

C

U

1_C6

Q 9 oR

W

B WJ 6

7

Z

J

NX T

7 oR

R

Q

U

S

_

QV

126_ V

O

X C6 MS _ o

R

I W _

F9

Z

7

Z

o

M

UO

J F

1

U

RR8

8 Q

C

X S

2_

o o

O 5

V 6

_ F W

B B

_C6T

W

S

ZU C

X o o

3 3

J

2

F

K7 2

5_U

J

2 2 QV

6

oK

_IZ _

QV O

8 8

_

UO

4 _C _C

W

C6 o

MF_IZ 747_C6S

6 W

O

K7

J

_

o o

WJ 6 6

W Q

S

J

C

B B

D D

QV

P 53

V o

V

6

3 3

TS TS

Q

_ K

T

2 2

o o

_ ZU U

V

74

P 9_ 9_ B B

C6

F F

OW o

S

3 3

_U _U

O

9_ RR8

F

C C

FD o

3 3

T

W

_

6S 6S

2 2

L482_C

QV O O C SNF_ZUOWQ o o

Z

_ _

J

_ ANC

B B U

W W

6

V

oB oB C C

N N I

8

ZWJ

M

O

3 3

J J

3

6 6

_ _

oB3 oB3

1 1

Q Q

W S _

T T

3 3 I I

Z Z oB oB

2 2

C3 V V

_Z D D

1 1

J

UO UO

Q

_ _

o o

1 1 QV

S S

4

09 09

C C

310_ 310_ B3 B3 V

W

o o _ _

_

F_ F_

_

4 4 WJ WJ

B B C C

Z

o

Z

_ _ J

_ _ 1 1

Z Z 3 3 RR V 3 3

C4_ C4_

o o

Z Z

6 6

U U 1 1 Q Q

_ _

B B

o C4 C4

V

_ _

8 8

Z Z

O O

V V

3 3 T oB oB

0 C C

_ _

17 17 WJ WJ

Z Z

7

8 o o

_ _ 6 6 C C

5

4 B B oB oB 3 3

S S Z Z

6 6

_ _

_

0 Q Q o o

2 2

3 3

D D

D D

C C

C

_

o o

B3 B 7 7

2 2 V V 3 3

_ _

2 2

6 6

6

o o

C B B

_ _

2_C4 2_C4

3 26 26

I

I 39333m 39332m 39329m 39328m 39327m 39326m 39324m 39323m 39322m 39320m 39318m 39317m 39316m 39312m 39311m 39310m 39309m

_ _

DS DS

Z Z _ oJ

F

C4 C4

3 3

4 24_C 24_C

Z Z

o o I

UO UO

9

2 2 Z

X _ _

U U

op op

B B

0

X X 7

3 3 U S S

C C

o

_

W W

5

333_ 333_

3

_ _

_ _

o SX SX

J J

_Z _Z

O

G I

5 5

RR RR

1_

Z

o

_ IZ IZ

V V C4 C4

J

J J

_IZ _I 6 6

WJ

oJ

8

G8

C

0

Q Q

V V oG81 U U

_ _ZV

D D

1 C

Z

o

54 54

5

3

S S V V Z

052

W W

oT 5_

C6 C6 2 2

3 J

3 H

o 09

V

_ _

X X_

oG8 0

4 4 D

_

J J

G

TSX Z Z

o

C

54 186 _

Q Q

T T _C _C o

C3_

_ A

_C

V V R o

2_

8

I I

6 o

N N T0

V V

C

Z Z

o

_Z

R

1 R o _

T

E0

5TD 5T

o

F F 0

Q Q

p

6

o

C

6

2

0 C6

R p

o D

_

0

8 oE

p _ _ 9

Z

C

T

o

p T

_

oC V V

-

R

6AT

p

o

C

2 o 5

_

N

IZ IZ 9

C8 o 6

8 o

DN

E o

C8 U oR

D oRR092_C2_I

oE098_C4MR_IZV 8

E

A T0

C3

6

7 R R

6

NFX_ C

3 _

F 0 1

C

p

U U

2

0

O

N

832_ _ 1 T 6

_

L_

93_C C 3 _

5 R

2

R

8 8

8 9 OW

OW

2

F R N

F W C

0 _

NF

C 3

S Z

_

7

5

_

3 2

8

3

7

9

R _

_ 0 C

SF

9 4

U C Z 3

X

8 6

_

V C

_ 6_

9

1 _ Z _

Z

_

6 44_C

_

A

2

_Z _ A_

_ UO O

C 6

J J

Z

4

_ C3 oT8 UO 3

U

C

C

7 C3

_

_Z

C

U Q Q

_

6T

TDN

C

_

WJ 3

C D

O

Z 1

U

3

Z

ZU 3

3 O

V V C DR

W 2

R DR

2

U _

D

W

W

O D

_I D X

W

2 Y 9

V X

C

S

O

J 2

R

S W

O

2 R

J

D

J

8

Z S

_

Q

_

_

Y

JQ _ 3

_ W

QV

_ Q

_

_I

_ _

W

F_ I

Y U JQ I

U

V A

_ Z

Z U

IZ

Z

Z U C

V

JQV

Z

_

O

IZ

U O _IZ

V

OW

ZU

O

3 U I V

W Z

O

W

_

O Q

W I

O

V

V

Z

J

V V

V W

V

J QV LETTER RESEARCH

a b A902 Mmu_AAX90603-1 A903 Hsa_AAA59134-1 A904 Mmu_EDL28238-1 Mammalian A905 Hsa_AAA74137-1 IL1 , 1 , & 7 A906 Mmu_AAI10554-1 A907 Hsa_AAH47698-1 A908 A910 Mmu_NM_145837 A911 Hsa_AAH36243-1 A912 Mmu_EDL09761-1 A914 Hsa_AAF28104-1 A915 Mmu_AAK59816_1 A918 Hsa_AAG40848-1 Mammalian A919 Mmu_EDL11677_1 IL17 A920 Hsa_AAF28105_1 A921 A922 Mmu_EDL14378-1 A923 Hsa_AAH67505-1 A924 Mmu_AAQ88439-1 A925 Hsa_AAH70124-1 A927 Cgi_KJ531893_1 A928 Cgi_KJ531897_1 A929 Lgi_172928 B403 B404 Cgi_KJ531894_1 B405 Cgi_KJ531895_1 B406 Lgi_152638 C804 Lgi_152641 C805 Lgi_152639 D104 Annelid & non- Lgi_176347 E183 cephalopod Lgi_228210

mollusc OL Sub Ova

Cte_199819 Skin PSG St15 IL17-like ANC Supra Retina Cte_207036 Testes Viscera

Cgi_KJ531896_1 Suckers Cte_209751 3 2 10 1 2 3 Cte_226557 Row Z-score Cte_209750 c Cte_209765 Cte_210775 Cgi_ABO93467-1 oIL17L_E183 oIL17L_A908 oIL17L_A910 oIL17L_A907 oIL17L_B404 oIL17L_A906 oIL17L_A904 Octopus IL17-like oIL17L_D104 oIL17L_A928 oIL17L_A927 oIL17L_A911 oIL17L_C805 oIL17L_C804 oIL17L_B406 oIL17L_B405 Octopus oIL17L_A903 IL17-like oIL17L_A905 A oIL17L_A902 B Human C oIL17L_B403 IL17 D oIL17L_A929 E oIL17L_A922 F oIL17L_A924 oIL17L_A925 oIL17L_A923 Lottia IL17-like oIL17L_A921 oIL17L_A912 oIL17L_A915 oIL17L_A919 Capitella oIL17L_A918 IL17-like oIL17L_A914 oIL17L_A920 Crassostrea IL17

Extended Data Figure 5 | Expansion of interleukin 17 (IL17)-like genes. and the Scaffold D gene is enriched in the viscera. c, Conserved cysteine a, Phylogenetic tree of interleukin genes in Obi, Cte, Cgi, and Lgi. Mammalian residues in human IL17 and invertebrate IL17-like proteins. The human IL17 IL1A, IL1B, and IL7 used as outgroups. Human and mouse IL17s branch proteins share a conserved cysteine motif comprising 4 cysteine residues, which from other members of the IL family. Octopus ILs (as well as all identified may form interchain disulfide bonds and facilitate dimerization56. Octopus invertebrate ILs) group with the mammalian IL17 branch and are named IL17-like proteins also contain this four-cysteine motif, highlighted in yellow. ‘IL17-like’. The 31 octopus genes are distributed across 5 scaffolds: scaffold A One octopus sequence encodes only 3 of these highly conserved cysteine (Obi_A), 23 members; scaffold B (Obi_B), 4 members; scaffold C (Obi_C), 2 residues. These four cysteines are also present to varying degrees in Lottia, members; scaffolds D (Obi_D) and E (Obi_E), 1 member each. b, Expression Capitella and Crassostrea sequences. Two additional conserved cysteine profile of 31 octopus IL17-like genes. Heat map rows are arranged by order residues were found in the octopus sequences and are highlighted in red. The on each scaffold. Blank rows indicate genes not expressed in our first cysteine residue is found in all invertebrate sequences examined, and none transcriptomes. The 27 genes found in our transcriptomes have strong of the mammalian IL17 sequences. expression in the suckers and skin. The scaffold C genes are enriched in the PSG

G2015 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

a b Vertebrate-like Chemosensory

c * Opsin †

d Frizzled ‡

e

Aplysia-like Chemosensory Adhesion OL Sub Ova Skin PSG St15 ANC Supra Retina Testes Viscera Suckers

3 2 10 1 2 3 Row Z-score OL Sub Ova Skin ANC PSG St15 Supra Testes Retina Viscera Suckers

Extended Data Figure 6 | G-protein-coupled receptors. GPCRs, also the Aplysia chemosensory GPCRs57 and 11 GPCRs are similar to vertebrate known as 7-transmembrane (7TM) or serpentine receptors, form a large olfactory receptors. c, We identified 4 opsins in the octopus genome (from superfamily that activates intracellular second messenger systems upon ligand top to bottom): rhodopsin, rhabdomeric opsin, peropsin, and retinochrome. binding. This figure considers a subset of the 329 GPCRs we identified in d, The octopus class F GPCRs comprises 6 genes: 5 Frizzled genes and 1 O. bimaculoides. The full complement of GPCRs is presented in Supple- Smoothened gene (*). e, Thirty octopus genes show similarity to vertebrate mentary Note 8.5. a, b, As reported for other lophotrochozoan genomes, the adhesion GPCRs. octopus genome contains chemosensory-like GPCRs; 74 GPCRs are similar to

G2015 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

Alpha 7 Beta 1 Alpha

+

O

Obi

m b

7 i

_ Alpha 1-4 1 8 9 _22017732 9/10 5 2 0 Ct a L 4 6

7

2 6 m Lgi_14 g 2 Mmu 0

5

Mm H e H 0 1 8 i

1 _ 0 a7 _165266 s 2 0 s 89 like 0 1 1 6 a H 8 b C a_ 3

2 u _A 3 4 C _ 7 0 C 074 5 6 t _A 00 2 e_ ACHa10 6 1 2 0 AC _ 5 te_ A Alpha 1-4 1 C 4 _ _ * 0 3155 C 3 e i _ 1 278 C e 2 e_007950 1201 2 H 2 39 m

t u C b 6 C t H m _ACHa7 21 m 5 e 2 H 2 4309 9 a m 0

t Cte_221893 t C m a C _ a9 O _1 e 782 D 28842 a1 9 1 8 3 e_ Dme_ 4 i_ 2 5 D like t _ _ _007 0 M 8 Beta 1 e_ gi 2 3 417 Hs Cte 11231 0 e e _ 0 Ct L 3 8 1 13 Ob 0 303342 7 25 Ct Ct e _ 8 _6 41 Dm _135 5 e 4 3 e Dme 008402 52 C 50 m 4 _ 3 Dme_02 Alpha 7 C t 1 D e_ t e_ 9 7 1 48 5 e 405 Dme_0 4 9 m 1 _ C 62 5 D e_919 5 1 _16714 2 Lg t 5 e 22 m e 2 89 Ct t 3 _ 38 9 Lg i C 16714 4 _ 8 6 te_ 525 8 2 90 7 C i_ 19 3 Obi Lg 19 9 Cte_ e_1 547 8 t 08 i_64 6 C i_212 O _22 L g 30 Alpha 9/10 gi 8 _220007514 34 L m b _ 1 2 i_ 73 3 22 0 73 Obi 3 07 Obi 2 te_ 160143 1 m 00 977 29 C 6 26 Ct te_ 50 _ 73 6 18 03 m 22 e 2 C i_ 8 _2216 m 22 348 4 64m 8 03 Lg i_ 20 4 6 b 3 Lgi O _2 0 8 96 i a4 _1031179 b 22 H O Cte m+ O i_ C bi_22 b A 4 _ _ a 159 O Mammalian 1 00930 Mmu ACH Ha2 Lg 244 a_ C s A 2 i_2 H _ a Alpha 1-6 Alpha Obi C 5m u H te 10 m a3 _2 M AC H _2 710 a_ 203 0 s AC 4309 H _ a3 Beta 1-4 741 H L Mmu AC 6 g 2m _ i_ sa Obi_ 5 + H Delta Cte_6259821 u_ACHa 220 59 m Ha6 M C 5 24879m _A sa Gamma H ACHa Lgi_524 5 + Mmu_ Ha 3 Ob Ct 81 a_AC Epsilon i_ e_525 Hs CHb 22 A Alpha 03 45 mu_ b3 5826 M CH Lgi_ m _A 1 10 + Hsa Ha 259 C Cte a_A 1 Obi _228 7 Hs HRA _22 74 AC 00 5 RA1 696 Hsa_ CH 4 _A L m+ gi_ Mmu RG 15228 ACH 9 sa_ Ct H RG e_529 ACH C 40 Mmu_ te_20 RE 12 ACH Obi_2 63 Hsa_ * 2037405 CHRE m u_A Lg Mm * i_15229 Obi_22000 0 Hsa_ACHRD 911m+ ACHRD * Mmu_ Lgi_5238 RB1 * 5 Hsa_ACH Cte_ HRB1 199133 Mmu_AC Lgi_16244 2 Beta 3 1 Mmu_ACHb Non-Alpha Dme_0077724 Hsa_ACHb2 Lgi_128212 Mmu_ACHb4 Lgi_168269 Hsa_ACHb4 Obi_22034659m Obi_22012266m Obi_22006184m+ Obi_22012265m Obi_22006182m Obi_22012263m+ 60m Obi_ Obi_220346 22029097m 086 Obi_22 Cte_176 029099m 4098 Obi_2 Cte_22 2012259m 3 O 48 bi_220 Cte_24 04961m 8233 Obi_22018127 Cte_21 Obi_2 m 99412 20 Cte_1 18129m 92 Obi_220 + 2139 Cte_ 12260 5m Obi_220 m 733 12262 bi_2202 Obi m O _22 e_214558 012 Ct 2 Obi 261 _220 m Obi_ 30 Cte_2601 9 094m 22 Obi_ 03 Putative Cte_5297 084 90112 220 2m te_ 2 Obi_220 308 C 06 40m Non-binding _204 1 Ob 30 Cte 50 i_2 843 Obi 20 m Putative te_114 21 06 C 54 _2 51 Ob 200651 7m _11 83 Cte i_2 Non- 3 Obi_2200201 8m e_1820 556 Ct O _20772 8 bi_ 0m e 292 2 65 binding Ct 5 2 O 2 19 _ 9 bi_ 00 m Cte 2 6 3 Obi_ 2 52 e_931 00 2m t 284 22 65 C 9 8 Lg _ 9 i 006 21 te 7 _6 m C 5 L 7 520 10 gi_ 9 _ 6m+ 1 58 te 5 2 Lgi 6 m C 5 _ 90 Alpha 45 7 Lgi 9 1 1 632 51 20 32 2 _2 L _162211 i_ i + g 5 b Lg 6190 i_ _1 L 9 O 44m 1 g 6 gi i_ 2 L Lgi_14656 9 * 336 0 5 0 1889 6 2 10 L 3 22 _ 7 gi 1 * e 8 48 _6 t 9 L C _1 + g 9 e 92 m L i_ 486 64 * Obi_ t _ 7 9 g 1 C te 9 Lgi_11487i 3 3 1 62 _ 5 * C 5 7 8 97 Lgi_9627 4 1 6 5 3 6 0 131 1 9 * 2 _ 9 L 8 3 2 6 3 g 0 1 1 8 Lgi i_ * i_ Lgi i_ 3 5 4 1 Lgi 10 g 8 _ 5 Ob L i_ 2 0 L 54546 3 OL g _2 091 e 12 8 g _ 6 L t 0 O 1 62 2 i_ 4 21 Sub * C b Ova _ Lgi_1

6 54 Skin 218 4 i_ e 191 St15 ANC t 1 8 O 0 PSG * 1 2 30 9 C e_ C b t 2 7 _1 O t i _ 608 0 2 * C 87 e_ Supra te 1 L 9741 6 bi_220 22 10 Testes Testes te_9 16759 Retina C L g * _ 3 11 C 49 3 L g i _13 03 0 8 9 4 7 L gi te i_ 6 3 Viscera Ob 8 e_12 7 5 g 31 1m C L _ 1 247 13 L i_139 Suckers L g 226 7 8 76 Lg 10553m Ct Lgi_5 13 7 g i Lgi 8 9 i L gi _ 2305 8 C _ C i _7 8 5 _9436 Cte_6306 g 8 1936 6 22 74 5 184 t i _ t m 5 te 46 _10 7 e i 5 1 e e_1042 _99 1 663 3 _1 3468 _ 7 6 t 1 818 6 _ C 7 _ 0 e_ e 5 1 4892 t t 1 _13 2 9 C 2 0 11 6 09 0 46 18 8894 C 08789 e 2 7 C 0 1 2 t 89 e_162907 1 _ 1 2 9 2 6 e_ 2 04 _ C t e 0 7 _ 7 t e_152 _ Ct 4 e t e_2 5 4 C e t 4 e 8

3 C t

t C 8 m 0 C

Ct

C

C Cte_115452 Non- 3 2 10 1 2 3 Alpha Row Z-Score

c Lst_AchBP Y N - A I S K P E V L T P Q L A R V V S D G E V L Y M P S I R Q R F S C D V S G V D T E S G - A T C R I K I G S W T H H S R E I S V D P T T - - N S D D S E Y F S Q Y S R F E I L D V T Q K K N S V T Y S C C P - E A Y Hsa_AchR7 Y N S A D E R F D A T F H T N V L V N S S G H C Q Y L P P G I F K S S C Y I D V R W F P F D V Q H C K L K F G S W S Y G G W S L D L Q M Q - - - E A D I S G Y I P N G E W D L V G I P G - K R S E R F Y E C C K - E P Y Obi_10697+ Y N S A N E V F D A T Y P T N V L V S Y N G F C H W V P P G M F K S T C Q I D I A W F P F D D Q K C T L K F G S W T H D G R Y L D L Q L D G D G N G D T S S F I R N G E W K L I A V P G - S R N V V K Y D C C P Q I Y L Obi_12266 C N S V S G D F S F D V D K E V T V K Y D G F V H L H I D K I F K T Y C R I N V E N Y P F D Q H E C D I T V C L E H Q M Y M E E T I E D F - - - V I D V K L K T K S N Q W N F S F E E T - E M E K D D ------V I Obi_12265 C N S V T G K F S F D D N K E V T V N R N G D V N L Y I D K I F E T Y C R I N V E K Y P F D E H E C D I S V C F E H Q M Y V E E T V G E F - - - D Y E V K L Q S A S N Q W D F N F E K S - D V E N D N ------I V Obi_12263+ C N G V M D R F K L D E D T E I F L T N E G T V F L Y I D N V F Q T Y C R I N V N K Y P F D E H E C D L L V C L N H Q M R E R K R K P S K ------Obi_29097 C N S A S G K F T F D E D T G V T L T S N G N T S L Y I D R I V N T Y C K V N I N K Y P F D E H E C D I S V C F R H Q I N T E E T L N N F - - - V Y N V T Y N P T Y N Q W E Y T F K E K - D I L K E G ------I I Obi_29099 C N S V T G K F T F D G N W G V T I K S D G S V H L H I D Q I F H T Y C K V N V N K Y P F D E H E C D I S V C F E H Q M N L E V M L H D F - - - M Y R V T Y K P I S N Q W D Y R Y E Y R - E V E K E E ------I I Obi_12259 C N D M S G N F A V H K - G G A T I E Y D G T V T F H M D G I F Q T Y C T I D M H K Y P F D E H E C Y I K S C L R H Q K Y K E Q T I Q N F - - - S F Y N M Y N S S S D T W D Y K F V V G - D V M E N G ------I I Obi_04961 C N D M S G K F S Q H E G E G A T V K Y D G S V S L H M D G I F Q T H C T I N M L K Y P L D E H E C N I T V C L G H Q E N I E K T M Q S F - - - S F N N L H N A E A D K W E Y K F A V G - N V T E K E ------I I Obi_18127 C N S V E G K F K F D E D K Q V S V R H D G I V N L N T E G I F N T Y C E I N M E N Y P F D E H I C ------Obi_18129+ R N S A E D K F I F Q K N K Q V F I K Y D G T I N L H I D G T Y R I Y C R I D I D K Y P F D E H I C Y L S I C L G T E M E N Q E T I Q F Q ------E W E F R L E K T - S E E ------N F Obi_12260 S N A E K V T K I P S L S E Y I T V S Y D G R T S Y F I R S I Y R T Y C S I D F Y K Y P F S L N V C K I Y F W L S N E M V S Y L K L Q N V - - - D L A N T S N I W T T I W N I Q L D G H - K Y D D D N S ------T D Obi_12262 C N A E T I Y N V S T P I P E A V M L S N G T I K T S T T L V Y T I H C K I D N T K Y P F D K Q A C E V H I C L P L S K L N N V R I K T I - - - - T T F K K Q V T L R N W N V E I D K V - L Q N H N E R ------I F Obi_12261 C N A I K L I G N H R Y E R Q V T V W Q N G V V E E E S F Y T Y Q L F C G V D N S R Y P F D V Q N C P T Y I C L P Y Q M N N L T L I K S L - - - R T D P V K N M E G - - W H I H T S T E - P P I T Y N D ------Q E Obi_30094 F V N G L S A V E S A A E P A I R L E Y S G N L N K Y Q K L S L K T F C P T E K D Q Y S F S - - - C P F M L K T Y P L P S T Q E R L R V T - - - D F E V N E K F Q S H Q W N A E V N T N - E T R I Y N E D ------Obi_30842 C N S V N E Q D D S N I N R E V Y V H Y N G T V E L W S L K Y I E T Y C Q V N A Y T Y P F D D Q K C K I Q M C V G L H S P D E T R L K T I - - - C Y W N M K F T E S Y K W D I H F S G K - A N G I N S Q ------S S Obi_30840 C N S V N E Q G D S N I N R E V Y V D Y E G T V Y L W S L K Y I E T Y C Q V N A Y T Y P F D V H Y C G I E M C V P L H S P N E T R I Q T I - - - Y Y R N M N F T E N Y K W D I H F S G E - A N G K V E E ------F S Obi_30843 C N S M T Q S E E K D S L D D V L I Y Y D G F V R M L S F T L L Q T Y C Q V N A Y S Y P F D E H K C E I R M C S A T Y H T D E A N V T S F - - - L L N V Y S E E E N Y K W Y M S I S D Q - E T Y ------S S Obi_06517 C N S M E N S E D K D D F P E L W I F S N G H V V M Y S F R L L N T Y C E V N A Y T Y P F D K H M C E I Y M C V A L H S V Q H T R I K T L - - - D Y H E L N F I Q N Y K W D I T L E G T - V N A T N D K ------F N Obi_06518 C N S M D K S E E N D G V G E L M L T Y T G W I N M W S F R L L H T Y C Q I N A Y T Y P F D E H T C E I Y L C V A L H T I N H T R I K E L - - - I Y E D S K F T Q N Y K W D I N V S G K - V N G T D E L ------F S Obi_15560 C N A M K E S E E K G S F L E V K V F N N G R V Q M R S L K L L K S Y C T F D A Y A Y P F D Q H D C E I Y I C V A L H D P V H T R I R T L - - - T Y D N L N Y S P N Y N W D I D Y N G I - K N A S D Q R ------F S Obi_06519 C N S M K E S D D E D N F P E V R I F N N G L V E R W S L P L L Q S Y C E V N A Y A Y P F D E H I C K I Y M C I A L H T P Q H T Q I N T L - - - I Y Y D A D H T Q N Y K W N V N I S G E - M K G I K F S ------F S Obi_06522 C N T M K Q S E D K D N P S E V S V Y F N G S I E I S L I K L L H T Y C E I N A Y T F P F D E H T C N V S M C V S L Q E L H H A K R T K L - - - T Y K S - R Q A K H S K W D I K F S G G - T N G T N Y Y H - - - - - Y S Obi_06521 F N S R T E S K Y K Y S Y Q D V T V Y S K G S V E M V S I R F L H T D C Q I E A Y I Y P F D L Q T C Y I F L G I P T Y K P Q D T K I K E I - - - L C G K E N D T T N Y Q W D I T L Y C N - V D S A N K H ------Y N Obi_06520 L N T L M E T Q S K N N F L E M T V D F N G S V T M V E I K L L Q T F C E I Y V Y N Y P F D A Q T C V I S M G I P S H K F Q D T K I K E L - - - S C Y R K S D I S D S E W G I S F S C N - V H G T N N S ------F S

Extended Data Figure 7 | O. bimaculoides acetylcholine receptor (AChR) c, Divergent octopus subunits lack nearly all residues necessary for ACh subunits. a, Phylogenetic tree of AChR subunit genes identified in Hsa, Mmu, binding. Alignment of sequence flanking the cysteine loop (yellow) of the Dme, Cte, Lgi and Obi. Black asterisk indicates a Dme sequence that groups L. stagnalis ACh-binding protein (Lst_AchBP), the human and octopus alpha- with alpha 1-4-like subunits despite lacking two defining cysteine residues. 7 receptor subunits (Hsa_AchR7, Obi_106971), and the 23 divergent AChR b, Expression profiles of octopus AChR subunits. Genes ordered as in the tree subunits. Essential ACh-binding residues on the primary (pink) and (a), starting from the grey arrow and continuing anticlockwise. Putative non- complementary (blue) side of the ligand-binding domain are indicated26,with ACh-binding subunits are highly expressed in the suckers. One sequence conservative substitutions in a lighter shade. Outside of the binding residues, was not detected in our transcriptome data sets. In a and b, red asterisks residues shared between the alpha-7 subunits are shaded in light grey, with indicate subunits with the substitution known to confer anionic permissivity58. bold letters for conservative substitutions.

G2015 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

Extended Data Figure 8 | Active transposable elements and gene expression (defined as having at least 75% of expression in a single tissue; see expression specificity. a, Transposable element expression across 12 tissues. Source Data file for this figure). P value indicates the F-statistic for the b, Correlation between the total transposable element (TE) load (in bp) in the significance of linear regression (H0: r2 5 0), with tissues with a P value #0.05 5 kb regions flanking the gene and the fraction of genes with tissue-specific indicated in pink.

G2015 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

Extended Data Figure 9 | Synteny dynamics in octopus and the effect of loss rates. Branch lengths, estimated with MrBayes55, reflect extent of local transposable element (TE) expansions. a, Circos plot showing shared synteny genome rearrangement (Supplementary Note 6). c, Enrichment of overall and across 6 genomes. Individual scaffolds are plotted according to bp length; specific TE classes (base pairs masked) around genes from ancient bilaterian scaffolds with no synteny are merged together (lighter arcs). Despite the large synteny blocks, including those absent in octopus (see key). Asterisks indicate size of the octopus genome, only a small proportion of the scaffolds show Mann–Whitney U-test with P value ,0.02. d, Transposable element insertion synteny. b, Synteny reduction in octopus quantified based on synteny inference history (Jukes–Cantor distance adjusted, see text) into the vicinity of genes using gene families with at least one representative in human, amphioxus, from ‘lost’ synteny blocks. Note that only one SINE peak is present; a more Capitella, Helobdella, Octopus, Lottia, Crassostrea, Drosophila, and recent peak (visible in ‘All genomic SINEs’) cannot be recovered from Nematostella. Drosophila, Helobdella and Octopus show the highest synteny those insertions.

G2015 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

Extended Data Figure 10 | Cephalopod phylogeny and novelties. a, Whole- octopus genome. c, d, Novel gene expression across multiple tissues. Bars depict genome-derived phylogeny of molluscs and select other phyla showing the all cephalopod novelties; dark grey indicates sequences with no similarity to relative position of octopus at the base of the coleoid cephalopods. For methods non-cephalopod genes using HMM searches (see Source Data for this figure). see Supplementary Note 7.1. Members of the cephalopod class are indicated in c, Counts of tissue-specific novelties in a given tissue. d, Proportion of blue, scale indicates number of substitutions per site. b, Phylogenetic tree of expression of novel genes versus total expression in individual tissues. CNS reflectin genes. Reflectins are cephalopod-specific genes that allow for rapid (central nervous system) combines Supra, Sub, OL and ANC expression data. and reversible changes in iridescence. Six reflectin genes were identified in the

G2015 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

Hsa_d achsous_ *

2

_

CRA_a_EAX

oRR oRR738-

IX 5

1 1 Z -

A 709_C80_U 54-1 Hsa_C _ .2 Hsa_ Hsa

1 H AmphiD A 4 Hsa_flamingo m 4 VIII Hsa_CDH C Lgi_174 1 H 1 m D 3 H Hs r

Hsa_CDH-8 p Hsa_fla te_181 Hsa AmphiCDH I 738_C28X 1 0 Hsa_P H g sa_C C 1 C h

h p Hsa_ 65 N Hsa_Fla sa_C _P_ NvDCH 9_oT r2 _ n sa_CD _ d

i i Hsa_ a_C h P_ P 5 2 v C l h 4

3 _ 04942 r5 2 DH-24_1 i 54. _ dhr I C r 1 Hs P 41 1 _CDH-18_1 C 8 Dme_st i dhr3 18

d oT Hsa 2 r 3 p h T 5 1 CDH-11 C P C 2 2 C C . h 938-

h CDH_EGF DH CH d 0 h 9 d C D iCd Hsa_CDH-6 a_CDH-1 DH-11 _ 0 _ m 06 p DH-8 r CDH h 0 D Hsa_ _CDH-18 335 hr3 5 5 228 R h C 59 1 a 4 H-18 FA 927 Hedg 04_C8 r 590 5 in L _ -19_1 d 4 -4 _AAG27 B H mingo_ m v 29- 3 s 4 H S 164 7 0 v 54 9 phi -19_2 S R 3 gi_ 0 6 5 3 C .1 3 9 5 go_2 C _ phiCdhr 1 4586 A o 2 Cte_1 N o -22 H _3_EA Nv N i 9 b NvCdh m 1 r Nv v 8 _EGF_LAG_ DH-10 NvCdhr 7 NvCd 16 95 R L a Amphi e_1 23 Amp m i_ _ 164 2 h _2 AmphiCELS 16 C A t i 2 0 dhr4 _AA 2 rr _2 N R226_C10Xgi_2 Lg i_ 2 CE g _ 2 2 H a DH-9 A g i 2 hr H y C g 3 _ E L 2 1 L i_ iCd D 4_ sc2a 1_AAG_ _ L RR8 D - AAF 5 82594 953 2 nigh L Lg g h _CAA8 _Dsc2 sc Hsa LAG_7 W64914 L 3 o C DH- H S L e_ -C D 3 1 Lgi_ _ 5 -2 VII oRR696- D 2_D _ R _ D 99 Ct mp H-1 1 - 1 619 AmphiC sa _ 3 _ t n - 26_a Hs K A CDH-4_1 H-3 3 li 4_ PCDH-8 00080.1 5 H a_N X AmphiCd _ 1 collin lin n_ C C 7pGR_2 pg s - l 4_2 i a_Fat 3 Hsa_C sa_C DH-15 o Cte_1783te_218758te R H _E-CDH 0 . sa a_CD H co n- oRR536_C13E Cte r_1 1 H a C 228852 .1 H CDH-13D sm ei in-2 Hs _ C e oglein- Amp Hs mo 4_ _ _ d s ogl o 22 _1 Hsa_CD m Q6V0I7. Hsa_ e ke_prote R Cte Hsa d 5721 -li R228Lg D h Hsa _ esm 16_13 iF 54 Hsa_ _ me_ sa esmogle 3 i 2220 AT Hsa_desmocola_des _1 H s hr1 X _C32_W _d DH- un 0 H a g 6L2T_W 5163 fat 2 Hsa_ds iCd 2 N 0 H h ot 9 8 4 o NvCDH2vCdhr Hsa_CDH 586_Csh 4 e RR1 D N Hsa_Cmp R _ 04 lik e 2 n me_ NvCDHv A 21724 6 R Cte_2 C oR m _ 2 40_C DH 6 D e _ 83 LS C 8 E Lg D Ct phiCdhr191 0 lsynteni 2 H 3 Cte _ iC a N i_1 1 6 1 46 oRR1 Cte_ 5 6 -N Am 15 v E 54 Cte mph 417 2_c VI FAT6 4L pp_XP_002737277.1 C Hs Lg 705 A 8 A 222745 2 _ 161 Cdhr1 5_ II Hs a 03_C33i_1044 -lik s 6 _ m 3 Lgi_232i_ 8 09 Hsa_a PCDHphiF 0202 _1 _P e Lg mphi23 _T i_ 8 1 in-1 CD A 87 A g 62 -3 P T-l LE4 L in C _Fa R05 16 nten n Am D H_ i Cte_22 1 ke i_ n-1 phiCH_FaFat_3t_2 oR g alsy Lg L e_181561synteTN AmphiCi_154208 a_c cal LS dhr12t_1 Ct s C synteni oR Ct H sa_ hi al Dme_CDH_87AR899_Ce H C N _2284d T hr8 Ampme_ 0451 D CLS 1 000 Cte_215553 43 _2 0 53 oR 12X Nv e 21 6 Lg Ct 6 R68 i_ te_ _22 Am 229249 C te 248 1-2_C13 C 9 14 phi 2 9 AmphiCdC Cte_210002i_2 AmphiC dh 9916 14 r Lg Ct 16 Lgi_159 e hr7 i_15 iCdhr Lgi_229247_19 dhr Lg 72 Hsa o Lg 917 6 Amph_230236349 2731401.1 _dachso RR88 i_154544 Lgi 1 P0028 8_C12D3 Lgi_2h_X 18 us_ Cte s_ 17 V C _1 gi_ 1_E te_ 395 L H AW6813 22 Lgi_237871237870 sa 960 i_ 6 s_pcdh11_Xlinked_XP_0 Amph_PCDH-1668 9 Lg _237869182 Dme 3.1 Lgi 17 iCdhr Lgi_ 84 _d 67 ach 18 Lgi_23786823 Lgi so _ Cte _1430 us Lgi _1 02741390.13 94 Lgi_166623_238827 AmphiPCD5638 Lgi 473 i_171 Am Lg phiPCDH1 gi_23768088 Hs H2 L 678 76 Hs a_PCDH-1 Lgi_1 6 a_P Hsa_PCDH-19_a_PCDH-172 Lgi_237 H Lgi_164389p sa_PC 39 Hsa_PCDH_alphHsa_ D Lgi_175623 PCDH-10H-19_c 1730 23 H Lgi_ p sa_PCDH_alph _1 Lgi_1738 Hsa_ a-C2_1 Lgi_237867 PCDH_alpa-C 5875 Hsa 1_1 Lgi_16 _PC ha-5_1 1 Hsa_ DH_alp Lgi_1718227182 PCD ha- Lgi_1 Hsa_ H_alpha-8_16_1 PCDH_a Lgi_232792r20 Hsa_PCDH_ lpha-11_1 Hsa_ AmphiCdh01617 PCDH_ alpha-10_1 Cte_2 a CDHR_3 Hsa_PCDH lpha-12 Hsa_ _alpha- _1 _156564 Hsa_PC 1_1 Lgi 60 DH_alph gi_1565 Hsa_PCDH_al a-2_1 L _C3 Hsa_PCDH_alphapha-13_1 oRR180 oRR277_C2DF _C2DX Hsa_PCDH_alpha--3_1 6-8_T060 Hsa_ 4_1 opRR52 K PCDH_alpha- oRR836_C5 Hsa_PCD 7_1 232259 H_alpha-9_2 Lgi_ Hsa_PCDH_alpha-9_1 Lgi_160343 Hsa_PCDH_beta-13 Cte_199156 Hsa_PCDH_beta oRR625_C15 -8 61952 Hsa_PCDH_beta-15 Lgi_1 Hsa_PCDH_beta-6 Hsa_Ret_a _96Ca Hsa_PCDH_beta-5 Dme_CDH ta-4 oE098_C4MR_IZV Hsa_PCDH_be oRR092_C2_IZ IV sa_PCDH_beta-3 H oRR444_C2Y_IZ sa_PCDH_beta-2 H opRR671_C3A_IZ Hsa_PCDH_beta-7 oC 0 831_C2X_IZ Hsa_PCDH_beta-1 oC a-9 826_C2Y_IZ Hsa_PCDH_bet oRR894_C2DY_IZ beta-14 Hsa_PCDH_ oT039_C3X2 beta-11 oC8 _IZ Hsa_PCDH_ 32_C3_IZ a_PCDH_beta-12 oE097_oT8 Hs -C5_1 o 98_C3_ gamma 1 E093_C6 IZ Hsa_PCDH_ opE100 TDS_IZUO _1 _C3DR JV Hsa_PCDH_gamma-C4_a-C3 opC _UO H_gamm 2_1 829_C3 WV Hsa_PCD ma-B opC8 DR_ZU DH_gam B1_1 25_C3 OWV sa_PC mma- opRR578_C3DRDR_UO H DH_ga -B7_1 o WV sa_PC ma pC82 _UO H am 6_1 oE 8_C3DRS_UOWVQV 1 096_C Hsa_PCDH_gDH_gamma-B-B5_ oRR 6A_Z sa_PC 4_1 576_C2 UOWV H DH_gammama-B oT096_ _Z gam _1 oR C3_Z Hsa_PC H_ -A11 1 R026_C _PCD gamma 12_ oG Hsa ma-A 1 808_C3S3A_ _PCDH_ gam _ oG Z Hsa DH_ -A10 _1 810-8 X_ _PC gamma A9 oT1 13_C6 UOWJQ Hsa H_ ma- _2 o 86_ PCD _gam a-A8 J05 C3L TDNS sa_ DH m _1 oG 4_C6 _Z F_ H PC _gam a-A8 812_C6ATA ZUOW Hsa_ DH mm 5_1 o TNF_ J _ga a-A J05 ZUOWJQ QV Hsa_PC DH mm oG 2_C NSF _PC ga _1 809_C6TD _Z Hsa DH_ 1 oJ0 NF_ UOWJ V _PC a-A _1 o 53 6T ZU sa m -A7 G81 _C3 DN OWJQVQV H _gam ma _1 o 5 _Z F_Z _PCDH_gamma-A2_1DH am A6 pR _C6 UO Hsa PC g ma- 4_1 oB R54 TDNF WJQ sa_ gam A oJ051_C333 4_C5 V H _PCDH__ ma- _C6T T _ZUO Hsa oF9 DNFX W _game_225723 9 73 3DA_ZNF JV Hsa_PCDH Ct 22241725 oB323_C_ _IZ _UO Cte_ 2 26 oB3 C3 UOWJQWJQ _18 79 HT Hsa_PCDH Cte 2 oB326_C24 4S SX V e_2 _C6D2_Z 2_ V Ct 222588075 oB32 V UOWV te_ 6 7 C 22 78 oB327_C2_C 5_I X_I _ 18 0 oT84 Z ZQV Cte 2 9 4S e_ 823 56 oB317_C X_Z Ct te 60 2 0_ 4S C 17 7 oRR C4 _Z V e_ 721 00 oB318 X_IZ V Ct 07 6D Cte 188 oB316_ 5_ SX_IZUWJ 33639 o _ C6_I e_2e3 96 B3 C6D2_ZUW Ct t 8 7 o 1 Z C 22 63 B3 0 C6SD UO 19 4 oB3 0 _C WJ Q Cte_ 2 603 5 o 9_ 4 _ V _ 5 6 B3 11_ C _Z IZ J Cte 1 3 o 1 4_ U QV te_ 926 L48 2_CC Z OJ C oR 3_Z V Cte 19513685 o R 2 _ 2 31 B3 883_C 4_ oK7 4_Z Z Cte te3 627 32 C e oB3 _C6TDS_C t 49 3_ C oK753_C62 _C6MS_Z Cte96860 oK 9 Z 03 o _C e_144547 K74674 6 F_ZU Ct Lgi71708d_a a o 7 SN_I ke d_ o B328_ _C6S Lgi717 e R _C TSN WJ O -lin k o R Z WJQV F 8 6 F U V _X -lin o 9 2 C6DTTPS F O 1 Y 27 o R 85_ 6 _ZU _ZU WJQV 1 _ RR1R _ o 71 C6 gi_123092pLgi51965i8209451 F C SF_UOWJF OW OW DH- L g 0 o 9 2 6TMFS D L 2 o T 7 2 _ P_ _I J QV PC 39034 15 F 671 7 6 C Z Q CDH-11 i_ oF979 _C6MS_ 5_UO WJQ V P g i62427 oC819_C6TSE8 C6 _I Z sa_ _ L 2 _ U H a Lgi_1Lg X2 2 o _ C SF_ Z O V s o E8 5_C6C 6M W UOWQWJV QV H C7 T6 6 _ 3 o 16 S F IZ V Lgi_139008_C7_U0 5206 oC8RR T X 5 7 _ SB S F 2_ W 71692 o 5_ C6 X X_Z JQV V Lgi oI31I3 9 _ UOW oRR9 Lgi 890 1 5 C N Z 10 6 opR 16 7_ 3 6_ DS X UOWUOW oRR4 Lgi5168 1 2 oR _ 2 _ C6 _ _I 5 1 o 5 C C IZU _ Z ZU J gi_10463205 1 oN657_C4DTI3 R3 R8 _C2 6 IZ Q 1 V 2 U J L i5 51 o 18 M MSNF U JQV V i_ g 0 ZUV 8 _ OW OW O Q Lgi_104g _Z oC818_CI321_C3_IZ 1 X_IZIZW S W W V L o _ 4 6 F_ZUO L 7_ 6 _C _C2 J i_1 UW oRR H8 C5D_ J QV J JQ C7_ZC oR _ Q QV Lg _C _ oRR 3X ZUOWJQV V _Z UWVWQ V 54 X oRR760 R oH849_C6H 76 _ 853_ IZ 2 _ IZ WJ V oRR6 C IZ R242R V o 27 7 3_ 6 6_ ZUO ZUO Q SF_ S R oRR H847 6_ 6 QV 27_C6C _ o 7 C TS F oRR746oR o _I W o 1 ZU W H852_C6 _C C _ 7_ 5 oL484_CR 6 oR 82_C6 _ 6 D F ZUOWJ IZ oR R 5 C6H _ UOWV o 71 _C6 6H HTSFX_ZF U A8 ZUO o I oRR8 W R R 447 X_ZUOWJQVZU TPBFX_OWJQTF_ZUOWQo _ C5_Z o O oL48 o RR1 R731_ _C T V oP0 RR R 6 T WJQV oR RR5 8 X_ T S UO o KTG_Z O 55_C6TXC4 FX W 52 _C B QV o 3_ 6 S E V _ UOW V oRR117 R W 8 S Z o R 5 S _I 6_C5_ZU O QV o R2 7 6_Z S IZUO F _ Z Z 9 1 A844_C5_ Z W _ o RR17 R72 69_C 6T o 5_C C6D GF ZU X_IZUOWZ Q _ V oM R R79 1 _ G o o RR1 3_C6T_IZU C6T SGFX_ZUOWQ oA848_84 W o M0 OW OW 854_C6 o 9 V ZU T o R 1 C _ZOQ V oA836_C6T N S oRR T 840_C4_V W o T U A830 C3 oA ZO V oA pR _ X_Z W A o QV oRR730_C RR7 8_ o o o _C6T o 6T III 0 5 U o 41_C4STF_ oA 411_C5_ZUO p 0 65 7 X_ o J C 6 W p 5 9 TS M X_ R A Q R V 61 4 Q JQV J V V B3 C6 Q _ 6T S O Q 7 OWQV 8 6 TS_ T7 2 C6 837 R161_C 5_C _ 6T _C4 J 9 S V Q J 8 _ C6T_ R 8 50_ J I opA R 8 oD 0 9 FX2_IZO Q U W Q 38_C6T oA843_C5X_ZUO 839_C5_Z 438_C5_ZU 37_C4_ Q _C C6 FX4 U oA8 42_C4_ _C6 _ F V 0 3 _ C6TF S 0 _ 6 8 J 2 _ J J 2 8 OW 5TF 39 6 T S_ OW ZUOW WJQV C _ 5 C6 3 C F_ZU 5 U 0 C6

U 6 0 JQ oD443 oA WJQV Z 852_ 1 _ oD 6T 6T_IZ BX_ _ Z 3_OWJ W

W _ UOWJQV W 5TS_ _ _ _ _

_ 1 4 5 UW oA8 _C TSDFX_ _ C6T ZU Z O _ D4 UOW C Z C C C6 C6 TK 5 H I O _ 5 J O O _ oA oRR5 ZUOWJQ _ TS J V 6 C UWJ OWJQVUO IZW o oRR8 I _ C6_ WJ V 6 _Z C Z Q C 6 C T U C U 55 IZUOWJQ U I 2 IZU OWJQV 5_IZU _ X_ZOWJ T_ DF_IZ Z O JQV T TS_ U _ 6 6 S_ V _ Z 6 S _ Z C6TSF_Z 6S_ZUOW Z DTS S W 2_ _ZUO OW T_IZUOWJQV Z U I UO WJQ 4 TX F QV T T_IZ 2 _ _ ZUO ZU WJ Q _ _ _ X 6 U QV 1 ZUJQV J 1 X O oA8 7_ I X I OW X T IZ V - D

Z - _Z Z Q TS

O 6TF_ZUO _ZUOWJ Z _ 6 _ W WJQ 5 O Q D T JQ 5 A83 U U C6TX_ C5TF_UOWJQV _ 6 O V I O D440_C _C6 I OWJ V 3

6 opT761_ C o F 3 U 26_C6T_ Z UO W Z O W V O W JQ C JQV o WV _C 2

_ W 2

U V B C 88_C U OW _C6TSF_ZUWJQV W oD43 0 W JQV JQV T 2 J T oRR027_C6SF_ZUO A8 _ T QV O O V V V WV oA853_C6KTSFX_ZUO oA827_C p 2 QV p o 1 JQ 11 V oA828_C6AS_ZUOWJQV H Q W

N659 8 o W

819_C6SX_ o oP0 2 JQ 6 o

A oA833_ pA824_ 8

J V J

QV oA81 C

o o oA A Q Q V oA820_ oA8

_

o

6

1

8

A

o

Extended Data Figure 11 | Phylogenetic tree of cadherin genes. This is a protocadherin expansion (168 genes); IV, human protocadherin expansion (58 larger image of Fig. 2a. Phylogenetic tree of cadherin genes in Hsa (red), Dme genes); V, dachsous; VI, fat-like; VII, fat; VIII, CELSR; IX, Type II classical (orange), Nematostella vectensis (mustard yellow), Amphimedon queenslandica cadherins. Asterisk denotes a novel cadherin with over 80 extracellular (yellow), Cte (green), Lgi (teal), Obi (blue), and Saccoglossus kowalevskii cadherin domains found in Obi and Cte. (purple). I, Type I classical cadherins; II, calsyntenins; III, octopus

G2015 Macmillan Publishers Limited. All rights reserved