ARTICLE

https://doi.org/10.1038/s41467-018-08197-4 OPEN Penaeid shrimp genome provides insights into benthic adaptation and frequent molting

Xiaojun Zhang 1,2, Jianbo Yuan1,2, Yamin Sun3, Shihao Li1,2,4, Yi Gao1,2,4, Yang Yu1,2,4, Chengzhang Liu1,2,4, Quanchao Wang1,2,4, Xinjia Lv1,5, Xiaoxi Zhang1,5, Ka Yan Ma6, Xiaobo Wang7, Wenchao Lin4, Long Wang4, Xueli Zhu4, Chengsong Zhang1,2,4, Jiquan Zhang1,2,4, Songjun Jin1,2,4, Kuijie Yu1,2,4, Jie Kong8, Peng Xu9, Jack Chen10, Hongbin Zhang11, Patrick Sorgeloos12, Amir Sagi13, Acacia Alcivar-Warren14, Zhanjiang Liu15, Lei Wang16, Jue Ruan7, Ka Hou Chu6, Bin Liu16, Fuhua Li1,2,4 & Jianhai Xiang 1,2,4 1234567890():,;

Crustacea, the subphylum of Arthropoda which dominates the aquatic environment, is of major importance in ecology and fisheries. Here we report the genome sequence of the Pacific white shrimp Litopenaeus vannamei, covering ~1.66 Gb (scaffold N50 605.56 Kb) with 25,596 protein-coding genes and a high proportion of simple sequence repeats (>23.93%). The expansion of genes related to vision and locomotion is probably central to its benthic adaptation. Frequent molting of the shrimp may be explained by an intensified ecdysone signal pathway through gene expansion and positive selection. As an important aquaculture organism, L. vannamei has been subjected to high selection pressure during the past 30 years of breeding, and this has had a considerable impact on its genome. Decoding the L. vannamei genome not only provides an insight into the genetic underpinnings of specific biological processes, but also provides valuable information for enhancing aquaculture.

1 CAS Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China. 2 Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China. 3 Tianjin Biochip Corporation, Tianjin 300457, China. 4 Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao 266071, China. 5 University of Chinese Academy of Sciences, Beijing 100049, China. 6 School of Life Sciences, The Chinese University of Hong Kong, Shatin, N.T. 999077, Hong Kong SAR. 7 Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China. 8 Key Laboratory for Sustainable Utilization of Marine Fisheries Resources of Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao 266071, China. 9 College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China. 10 Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC V5A 1S6, Canada. 11 Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA. 12 Laboratory of Aquaculture & Artemia Reference Center, Ghent University, Coupure Links 653, Gent 9000, Belgium. 13 Department of Life Sciences and the National Institute for Biotechnology, Negev Ben Gurion University, Beer Sheva 84105, Israel. 14 Environmental Genomics Inc., P.O. Box 196 Southborough, MA 01772- 1801, USA. 15 School of Fisheries, Aquaculture and Aquatic Sciences, Auburn University, Auburn, AL 36849, USA. 16 College of Life Sciences, Nankai University, Tianjin 300071, China. These authors contributed equally: Xiaojun Zhang, Jianbo Yuan, Yamin Sun, Shihao Li, Yi Gao, Yang Yu. Correspondence and requests for materials should be addressed to K.H.C. (email: [email protected]) or to B.L. (email: [email protected]) or to F.L. (email: [email protected]) or to J.X. (email: [email protected])

NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4

rustacea is a dominant and diverse group of aquatic ani- Table 1 Summary of L. vannamei genome assembly mals, among which the decapods (order ) C fi comprise members of major ecological signi cance in Genome assembly statistics marine and freshwater habitats and include many species with high economic values, such as shrimp, , lobsters, and cray- Total length 1,663,559,157 bp Number of scaffolds 4683 fish. In particular, the penaeid shrimp (family Penaeidae) repre- fi Longest scaffold 3,458,369 bp sents the most important group in sheries and aquaculture, and Contig N50 length 57,650 bp 1 has therefore attracted considerable research attention . Scaffold N50 length 605,555 bp Penaeids predominantly inhabit shallow seas in tropical and Scaffold N90 length 204,841 bp subtropical areas. Their life history typically involves migration Genome characteristics between offshore and inshore waters. After spawning, the GC content 35.68% planktonic shrimp larvae in offshore waters develop into Content of transposable elements 16.17% postlarvae, which move to inshore waters and become benthic. Content of SSRs 23.93% Thus the shrimps have to adapt to different environments in Predicted protein-coding gene number 25,596 their life but the underlying biological mechanisms have seldom Predicted noncoding RNA gene number 2777 Quantity of scaffolds anchored on linkage groups 3275 been explored. Similar to all , penaeids exhibit dis- Length of scaffolds anchored on linkage groups 1,449,959,403 bp continuous growth through intermittent molting. Different from most insects, where molting usually occurs at the larval SSR simple sequence repeat, GC Guanine/Cytosine stages for growth and metamorphosis, molting occurs throughout the lifetime of . For instance, the penaeid shrimp experiences about 50 molts during a lifetime2, Results far more than in other arthropods3. Thus, shrimps could be an Genome sequencing assembly and annotation. The L. vannamei ideal model in studying the molting process. However, the genome size was measured to be 2.45 Gb by flow cytometry detailed mechanisms of molting regulation are far from (Supplementary Fig. 1), similar to the size estimated by k-mer understood. analysis (2.60 Gb, Supplementary Fig. 2). The sequencing and With an annual capture production of ~1.3 million tonnes, assembly of the L. vannamei genome were challenging due to the penaeids comprise the bulk of the global shrimp catch. Shrimp highly abundant SSRs as suggested by genome survey analysis13. farming began in the 1970s, and has been one of the fastest We therefore applied a variety of sequencing technologies, which growing sectors of the rapidly expanding aquaculture industry. generated 828 Gb of Illumina clean sequences (338×), 133 Gb of The penaeid shrimp farming production reached >5 million PacBio long reads (54×), and 34,266 BAC end sequences (0.46×) tonnes, valued at over US$32 billion in 20164.ThePacific white (Supplementary Tables 1−3). We also conducted numerous shrimp Litopenaeus vannamei is the most important cultured conventional approaches for genome assembly, which yielded shrimp worldwide, accounting for ~80% of total cultured unsatisfactory results (Supplementary Table 4). Finally, we penaeid shrimp production. While its native range is in the East developed the WTDBG approach, which uses a fuzzy Bruijn Pacific Ocean, L. vannamei is now widely farmed in Central and graph method, to obtain the best assembly with the highest South America and in Asia, particularly in China, Indonesia, continuity and accuracy for the genome, that contains 1.66 Gb in Thailand, and Vietnam, with a number of breeding lines 4683 scaffolds, with contig N50 of 57.65 Kb and scaffold N50 of available. These lines are mostly produced through traditional 605.56 Kb (Table 1). The assembly was comparable to, or better selective breeding procedures, and genomic information would than, those of other crustaceans, including the recently published be extremely useful for future genetic manipulation. As early as genome of the marbled crayfish (scaffold N50 of 39.40 Kb)10 1997, in an international workshop on genome mapping of (Supplementary Table 5). aquaculture , the penaeid shrimp was identified as one The assembly showed high integrity and quality. Over 93% of of five target organisms for genome sequencing, together with Illumina reads could be mapped to the genome (Supplementary salmon, catfish,tilapia,andoyster5. The genomes of the latter Table 6). The genome covered >94% of the unigenes assembled four species have been published over the past decade6–9,but from RNA-Seq data and 94.76% of the conserved core eukaryotic no complete genome of the shrimp has been reported to date, genes (Supplementary Tables 7 and 8). The accuracy of our despite the efforts of a number of major research groups. For assembly is further substantiated by the 14 sequenced BAC other crustaceans, high-quality genome assembly is only clones, which were completely covered by corresponding available for the water flea Daphnia pulex,theamphipodPar- scaffolds with high synteny (Supplementary Figs. 3 and 4). To hyale hawaiensis, and the marbled crayfish Procambarus virgi- assemble chromosome-level sequences, a high-density linkage nalis10–12. map was used to anchor the scaffolds13. A total of 3275 scaffolds Here we present a high-quality de novo reference genome were anchored to 44 pseudochromosomes, representing 87.34% assembly for L. vannamei, with an extremely high proportion of of the total genome (Fig. 1a, Supplementary Table 9). simple sequence repeats (SSRs), which have been the major The k-mer analysis indicated that repetitive sequences obstacle to genome sequencing and assembly. A prominent account for ~78% of the genome (Supplementary Fig. 2), which characteristic of the genome is the expansion of a series of genes is more than that identified in the final assembly (49.38%) related to visual system, nerve signal conduction, and loco- (Supplementary Table 10), indicating that some repetitive motion, which apparently better equip the shrimp to adapt to sequences are still missing from the assembly. The genome its benthic habitat. The genome also sheds light on regulation of contains the highest proportion of SSRs (23.93%) among all frequent molting through an intensified ecdysone signal path- sequenced genomes (Fig. 1b, Supplementary Table 11). way, with precise control by hormones and environmental The mean length of SSRs is 72.21 bp, which is over twice as long factors. Moreover, our genomic analyses reveal that selective as those in other arthropods (20.11−31.91 bp). The SSRs breeding has exerted a significant impact on the genome of L. density (3315.23/Mb, one SSR per 301 bp) in the L. vannamei vannamei broodstocks and the genetic resources acquired from genome is also, as far as we are aware, the highest among other this study will be useful for further genetic improvements in reported genomes, except Pediculus humanus (4508.69/Mb) shrimp culture. (Fig. 1b). Dinucleotide repeats are the dominant type of SSRs,

2 NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4 ARTICLE

5000 a b Crustaceans LG44 LG1 4500 P. humanus Vertebrates

30 LG2 20 10 (10.52%) 0 10 50 20 0 30 LG43 40 40 Insects 50 30 1 0 20 10 LG3 4000 10 0 Arachnida 10 70 0 2 20 60 30 Other 50 40 3500 50 40 LG4 LG42 60 30 3 20 0 LG5 3000 L. vannamei 10 10 (23.94%) 0 0 30 LG6 10 2500 20 4 0 LG4110 10 LG7 0 20 2000 40 5 0 H. robusta 30 10 LG8 1500 20 (6.36%) LG40 Density (counts/Mbp) 0 10 6 Ϲ 10 1000 10% 0 D. rerio 10 P 0 Ϲ 7 P LG9 5% H A P D. pulex (2.85%) LG39 10 500 0 A O P Ϲ O O 2% A P J P P 20 (0.78%) 30 A J J H Ϲ O P 1% H J O A 20 J 8 O 30 0 J O J H J J J O H LG3810 H 40 J 0 10 20 30 40 50 60 70 80 0 O 0 LG10 C J C O 9 C J 10 30 JJ Average length H 20 20 J 10 J J c J 30 1000 LG37 10 P H 0 LG11 0 AT 30 P P 10 CC 20 (10.21%) 20 HH CCC 800 LG12 10 CC 0 LG36 CCC J 0 JJJJJ 10 P P AC PCCC 10 0 (4.75%) OO 600 0 PP LG35 10 PP P AG Ϲ10% J 20 LG34 30 P (4.52%) P 30 LG13 20 400 H 10 H J 0 H J 0 J 0 P H LG14 LG33 20 J 10 Density (counts/Mbp) H JO Ϲ H O ATAG 4% 10 H 20 200 AAT H (0.43%) 0 P 30 CG J H (0.88%) AACCT 60 P J Ϲ1% P P 40 LG15 (0.02%) Ϲ P (0.07%) 0.4% 50 Ϲ0.1% 40 P 0 0 LG32 J O A 10 30 H A 20 40 60 80 100 120 H P H A 20 H H 20 J A P H J H P A O A 30 H 10 P P O A Average length

H 0 A P 40 J P A LG16

A P 40 0 d 30 10 LG31 iii 20 20 10 30 0 40

20 LG17 0 LG30 10 10 0 0 40 10 30 20 LG18 LG29 20 10 0 0 0 10 LG19 40 20 30 30 20 LG21 0 10 LG28 0 10 0 10 30 20 LG20 20 30 0 10 0 0

10

20 30 30

20 0

10

LG27 40 LG22 LG24 LG26 LG25 LG23

Fig. 1 The genomic characteristics of L. vannamei. a A schematic representation of the genomic characteristics of L. vannamei. Track 1 (from the outer-ring): 44 linkage groups (LGs) of the shrimp genome. Track 2: Scaffolds anchored to each linkage group. Track 3: Protein-coding genes present in the scaffolds. Red represents genes on forward strand and green for genes on reverse strand. Track 4: Distribution of gene density with sliding windows of 1 Mb. Higher density is shown in darker red color. Track 5: Distribution of GC content in the genome. Track 6: Distribution of SNPs density with sliding windows of 5 Mb. Higher density is shown in darker blue color. Track 7: Distribution of six significantly expanded gene families in the genome, which are opsin (O), peritrophin-like protein (P), chitinase (H), calcified cuticle protein (A), crustacyanin (C), and JHE-like carboxylesterase 1 (J). Track 8: Distribution of SSRs in the genome. Higher density of SSRs is shown with deeper color. Track 9: Distribution of miRNA in the genome. Clusters of co-transcribed miRNAs at adjacent positions are displayed in stacked style. Track 10: Schematic presentation of major interchromosomal relationships in the shrimp genome. b SSR content among different animal genomes. c Distribution of different types of SSRs in the L. vannamei genome. d Fluorescence in situ hybridization (FISH) of

(AACCT)n type SSR to the nucleus (i) and the chromosomes (ii) of L. vannamei. SNP single nucleotide polymorphism, SSR simple sequence repeat

with (AT)n,(AC)n,and(AG)n accounting for 81.40% of total vannamei has a longer average exon size (259 bp) and more exons SSRs (Fig. 1c, Supplementary Fig. 5). Most SSRs were located in per gene (5.94) (Supplementary Fig. 9, Supplementary Table 13). intergenic regions (24.63%) and introns of protein-coding We also annotated noncoding small RNAs, including 1458 genes (22.07%), and far fewer were found in exons (1.41%). tRNAs, 464 rRNAs, 296 small nuclear RNAs (snRNAs), 255 small fi fl (AACCT)n, a telomere component identi ed by uorescence nucleolar RNAs (snoRNAs), 90 ribozymes, and 214 microRNAs in situ hybridization (Fig. 1c), is longer than many other SSRs, (miRNAs) (Supplementary Table 14). with the longest SSR (13,769 bp), belonging to this type (Supplementary Fig. 6). Furthermore, the length of (AACCT)n located in introns is longer than those found in other genomic Genome evolution. Phylogenetic analysis indicates that crusta- regions (Supplementary Fig. 7). ceans and hexapods form a monophyletic group of Pancrustacea, Transposable elements (TEs) account for 16.17% of the L. with the latter nested within the former, making Crustacea vannamei genome, with DNA transposons (9.33%) and long paraphyletic (Fig. 2a). As members of , L. vannamei interspersed elements (LINEs, 2.82%) comprising the two major (Decapoda) and P. hawaiensis (Amphipoda) diverged at about classes. En-Spm (6.39%) was found to be the most abundant TE, 240 MYA (million years ago), i.e., in early Mesozoic. This with its abundance markedly higher than in D. pulex (0.05%), P. – divergence occurred after the Permian-Triassic mass extinction, virginalis (0.01%) and P. hawaiensis (0.25%)10 12 (Supplementary when about 96% of marine species became extinct14, followed by Table 12). Most TEs in L. vannamei showed higher divergence the radiation of many new life forms. This divergence period is (substitution rate 19−33%) than those in other crustaceans also consistent with the radiation of shrimp-like decapods in the (Supplementary Fig. 8). However, LINEs in L. vannamei, early Middle Triassic15. We found three prominent genome especially RTE-BovB and Penelope, showed a low divergence characteristics from L. vannamei that might underlie the rapid (Supplementary Fig. 8). evolution of penaeid shrimp, namely, abundant SSRs, a high In total, 25,596 protein-coding gene models were annotated in proportion of taxon-specific genes, and extensive tandem gene L. vannamei. Compared to D. pulex and P. hawaiensis, L. duplications.

NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4

a Anopheles.gambiae +214/–1662 Gene families +246/–410 Expansion/contraction Drosophila melanogaster +143/–207 +2998/–307 Bombyx mori +611/–122 +1064/–1284 Tribolium castaneum +1004/–298 +1734/–616 Insecta Apis mellifera +2/–38 +2267/–847 Locust migratoria +369/–816 +27/–759 +118/–195 Zootermopsis nevadensis +131/–705 +267/–403 Pediculus humanus +94/–1495 Acyrthosiphon pisum +27/–216 +2442/–1303 Daphnia pulex Crustacea +630/–1141 +0/–4 +127/–697 Litopenaeus vannamei +762/–982 +43/–369 Malacostraca Parhyale hawaiensis +411/–766 MRCA Arachnida Ixodes scapularis (7174) +396/–964 +118/–1224 Tetranychus urticae Myriapoda +1353/–1515 Strigamia maritima +664/–523

674.513 539.831 405.189 329.079 239.892 200.146 163.044 94.545 47.3028 0.001 million years ago

bc S. maritima All genes I. scapularis 1:1 Orphan genes 1 T. urticae X:X P. hawaiensis Pancrustacea-specific

L. vannamei Insecta-specific

D. pulex Crustacea-specific 0 Axis 2 A. pisum Malocastraca-specific

L. migratoria Patchy B. mori Species-specific –1 D. melanogaster

0 5000 10,000 15,000 20,000 25,000 30,000 35,000 Gene number –2 –1 0 1 Axis 1

Fig. 2 Comparative genomics analysis of L. vannamei and other arthropods. a Phylogenetic placement of L. vannamei in the phylogenetic tree. Numbers on branches indicate the number of gene gains (+) or losses (−). The estimated divergence times are displayed below the phylogenetic tree. Image credits: Gewin V, Martin Cooper, Bernard Dupont, Carlo Brena, Gilles San Martin, David Ludwig, S. Rae. b Comparison of the gene repertoire of ten arthropod genomes. “1:1” indicates single-copy genes, “X:X” indicates orthologous genes present in multiple copies in all the ten species, where X means one or more orthologs per species, “patchy” indicates the existence of other orthologs that are presented in at least one genome. c Principal component analysis (PCA) on relative codon usage of orphan genes

Because of their special mutational and functional qualities, Compared with other crustaceans, the L. vannamei genome has SSRs play a major role in generating genetic variation under- 762 expanded gene families and 16,291 species-specific genes lying adaptive evolution16. Besides L. vannamei,twopenaeid (>57% of the entire gene repertoire, Fig. 2b), including genes species, Penaeus monodon and Marsupenaeus japonicus,also related to myosin complex, chitin binding, metabolic process, and have a high proportion of SSRs (~10%) in their genome, signaling transduction (Supplementary Data 1–2, Supplementary traceable through low coverage genome sequencing17.How- Tables 15–16, Supplementary Fig. 11). Orphan genes can ever, the genomes of the marbled crayfish P. virginalis and the contribute to lineage-specific adaptation19. A total of 3369 amphipod P. hawaiensis contain only 0.99 and 1.27% of SSRs, orphan genes were identified in the L. vannamei genome. They respectively (Supplementary Table 12). Therefore, the expan- displayed a lower number of exons (4.47 exons/gene), but a sion of SSRs might stem from the common ancestor of the greater length of exons (292 bp/gene) in contrast to the average penaeid shrimp. SSRs play critical roles in regulating genome length of all genes (5.94 exons and 260 bp per gene). They plasticity (including DNA recombination and replication) and exhibited special gene structure characters, and special temporal/ gene expression18.SSRsinL. vannamei were widely distributed spatial expression patterns, suggesting their independent de novo among introns (22.07%) in 16,741 genes (65.47%), whose origins (Fig. 2c, Supplementary Figs. 12 and 13). When searched expression may be regulated by SSR polymorphism. SSRs may against the transcriptome unigenes of other decapods, the orphan also contribute to DNA recombination with TEs, as most TEs genes were found to be present in other penaeids, such as and their up- or downstream sequences contain SSRs Fenneropenaeus chinensis (83.07%) and P. monodon (64.59%), (Supplementary Fig. 10), which make up more than 90% for but rarely present in other decapods, e.g. Exopalamon carincauda ERV1,Charlie,Sola,MuDR,andEn-Spm.Thus,thesignificant (1.95%), Eriocheir sinensis (1.09%), and Neocaridina denticulata expansion of SSRs might provide a unique genetic architecture (3.82%), suggesting that these orphan genes are lineage-specific, for the shrimp’s adaptive evolution. and may contribute to penaeid-specific adaptation.

4 NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4 ARTICLE

a c Scaf_2598

20,000 120,000 220,000

Scaf_2919

230,000 330,000 430,000

4

7

9

6

1

N

A Daphnia pulex Dappu1|47925

V

DaphniaDaphnia pulex Dappu1|23519pulex Dappu1|2566 L

Daphnia pulex Dappu1|24264 i Limulus polyphemusDaphnia ANF89421.1 pulex Dappu1|47520 e

Daphnia pulex Dappu1|47330 m Limulus polyphemusDaphnia pulex ANF89420.1 Dappu1|223107

a Limulus polyphemus AIT75833.1 n n

a

v

Daphnia pulex Dappu1|307122 Daphnia pulex Dappu1|312425 s u

e

Daphnia pulex Dappu1|97105 a

Daphnia pulex Dappu1|303264 n

Daphnia pulex Dappu1|243539 e

p

o

t

Daphnia pulex Dappu1|51511 i Litopenaeus vannamei LVAN16960 L Litopenaeus vannamei LVAN16963 Litopenaeus vannamei LVAN16953 Litopenaeus vannamei LVAN16987 Daphnia pulex Dappu1|51251 Litopenaeus vannamei LVAN16986 Litopenaeus vannamei LVAN16964 Litopenaeus vannamei LVAN16968 Litopenaeus vannamei LVAN16973 Limulus polyphemus ANF89424.1 Litopenaeus vannamei LVAN19552 Daphnia pulex Dappu1|60874 Litopenaeus vannamei LVAN19553 Litopenaeus vannamei LVAN16995 Litopenaeus vannamei LVAN16992

Limulus polyphemus ANF89422.1 2 Litopenaeus vannamei LVAN16991

8 Litopenaeus vannamei LVAN16994 80 83 Litopenaeus vannamei LVAN19562 Limulus polyphemus ANF89423.1 71 Litopenaeus vannamei LVAN19560 Litopenaeus vannamei LVAN19559 98 Litopenaeus vannamei LVAN19558 Bombus impatiens AAV67326.1 100 Litopenaeus vannamei LVAN19579 88 89 Litopenaeus vannamei LVAN19564 100 Litopenaeus vannamei LVAN19568 Cataglyphis bombycinus AAC05091.1 100 100 99 99 Litopenaeus vannamei LVAN19580 98 Culex quinquefasciatus XP 001851157.1 100 Litopenaeus vannamei LVAN19573 Apis mellifera AAC13418.1 86 93 Litopenaeus vannamei LVAN19572 Ptero RGR Arth Rh7 Visual Total 100 Litopenaeus vannamei LVAN19576 b Anopheles gambiae XP 001688790.1 97 64 Litopenaeus vannamei LVAN19575 73 Litopenaeus vannamei LVAN19578 Litopenaeus vannamei LVAN19577 98 Neogonodactylus oerstedii ACU00210.1 100 53 0 42 42 Litopenaeus vannamei Papilio xuthus BAA93470.1 51 LW type 98 66 Neogonodactylus oerstedii ABG37007.1 Crustacea 0 0 0 98 100 75 62 Neogonodactylus oerstedii ACU00211.1 DrosophilaManduca melanogaster sexta NPAAD11965.1 476701.1 100 67 Neogonodactylus oerstedii ABG37009.1 98 92 100 Coronis scolopendra ACU00198.1 Daphnia pulex Drosophila melanogaster AAA28854.1 100 Neogonodactylus oerstedii ACU00218.1 7 0 7 1 20 35 73 100 Neogonodactylus oerstedii ACU00213.1 68 57 93 Neogonodactylus oerstedii ACU00214.1 100 Megoura viciae AAG17120.1 51 80 Neogonodactylus oerstedii ACU00220.1 1 10 Acyrthosiphon pisum Daphnia pulex Dappu1|303450 Neogonodactylus oerstedii ACU00212.1 0 0 5 4 Neogonodactylus oerstedii AIF73509.1 99 98 Lysiosquillina maculata ADT71767.1 92 99 73 None visual type Squilla empusa ACU00238.1 Litopenaeus vannamei LVAN07316 93 76 Coronis scolopendra ACU00194.1 19 Sympetrum frequens 100 95 1 0 1 1 16 68 95 Coronis scolopendra ACU00195.1 Insecta Litopenaeus vannamei LVAN11666 87 Coronis scolopendra ACU00189.1 Procambarus clarkii ALJ26468.1 96 Odontodactylus scyllarus ACU00227.1 2 2 Pediculus humanus 50 96 86 Neogonodactylus oerstedii ACU00217.1 0 0 0 0 Neogonodactylus oerstedii AIF73508.1 98 82 Neogonodactylus oerstedii ACU00221.1 64 Gonodactylus smithii ACU00203.1 Branchinella kugenumaensis BAG80984.1 94 Neogonodactylus oerstedii ACU00215.1 SW type SW 53 87 94 Procambarus clarkii AAB25036.1 0 0 0 2 2 Zootermopsis nevadensis Triops longicaudatus BAG80983.1 100 0 73 100 Procambarus clarkii ALJ26467.1 Triops granarius BAG80978.1 74 99 Neogonodactylus oerstedii ABG37008.1 70 74 MW type 98 Neogonodactylus oerstedii ACU00222.1 Daphnia pulex Dappu1|214454 73 95 1 0 0 0 6 7 Apis mellifera 99 66 Neogonodactylus oerstedii ACU00224.1 87 Litopenaeus vannamei LVAN07187 Drosophila melanogaster AAC47426.1 99 Litopenaeus vannamei LVAN19016 Apis mellifera AAC13417.1 58 99 79100 Litopenaeus vannamei LVAN11245 2 0 0 0 3 5 Tribolium castaneum 99 81 Litopenaeus vannamei LVAN19546 Schistocerca gregaria CAA56378.1 100 Manduca sexta AAD11964.1 Bombyx mori NP 001036882.1 Anopheles gambiae XP 319247.1 100 94 Papilio xuthus BAA31722.1 1 0 0 1 4 Bombyx mori Apodemia mormo AAT91643.1 100 100 Cataglyphis bombycinussp Q17296.1 6 Papilio xuthus AA93469.1 Athalia rosae BAG32517.1

99 50 Sphodromantis sp. CAA50658.1 100 86 Manduca sexta AAD11966.1 100 78 99 100 Homalodisca vitripennis AAT01077.1 0 0 6 92 66 74 Megoura viciae AAG17119.1 0 1 7 Drosophila melanogaster 100 Apis mellifera NP 001071293.1 Drosophila melanogaster CAB06821.1 Limulus polyphemus AEL29244.1 50 84 100 Calliphora vicina AAA62725.1 100 Plexippus paykulli BAG14335.1 99 Drosophila melanogaster NP 524407.1

5 100 99 Drosophila melanogaster AAA28734.1 0 3 1 99 9 0 1 5 Tetranychus urticae Hasarius adansoni1 BAG14332. 100 Hasarius adansoni BAG14330.1 100 100 99 99 Plexippus paykulli BAG14333.1 Leptuca pugilator ADQ01811.1 92 90 8 Limulus polyphemus AIT75832.1 100 100 9 99 Hasarius adansoni BAG14331.1 Chelicerata 79 Plexippus paykulli BAG14334.1 100 88 Limulus polyphemus P35361.1 2 3 100 Limulus polyphemus P35360.1 1 0 7 13 Limulus polyphemus Vargula tsujii ADZ45236.1 100 69 Limulus polyphemus AAA02498.1 Limulus polyphemus AIT75830.1 90 Limulus polyphemus AIT75831.1 Litopenaeus vannamei LVAN03914 Branchinella kugenumaensis BAG80985.1 100 Triops longicaudatus BAG80982.1 Triops granarius BAG80977.1 Triops granarius BAG80980.1 Triops granarius BAG80979.1 Daphnia pulex Dappu1|321382 Daphnia pulex Dappu1|43742

Branchinella kugenumaensis BAG80986.1 Daphnia pulex Dappu1|216106

Daphnia pulex Dappu1|305803 pulex Daphnia Daphnia pulex Dappu1|326259

Daphnia pulex Dappu1|106095 pulex Daphnia Daphnia pulex Dappu1|305771 Daphnia pulex Dappu1|326260Daphnia pulex Dappu1|54168 D

Litopenaeus vannamei LVAN16395 D a

Litopenaeus vannamei LVAN23136 a p

p h

LitopenaeusLeptuca vannamei pugilator LVAN16394 ADQ01810.1 h n

n i

i a

d MW type LW type a

p

Charybdis japonica AEL33282.2 p u

Gelasimus vomeris ACT31581.1 u l

l Hemigrapsus sanguineus BAA09133.1 e

e x

Leptuca pugilator ADQ01809.1 x

D

D a

Gelasimus vomeris ACT31580.1 a p

Hemigrapsus sanguineus BAA09132.1Daphnia pulex EFX63570.1 p p

p u Litopenaeus vannamei LVAN03731 u

Daphnia pulex Dappu1|67015 1

Litopenaeus vannamei LVAN10254 1 |

| 1

10 Litopenaeus vannameiDaphnia LVAN10257 pulex Dappu1|307030 3 9 Daphnia pulex Dappu1|307031 0

Daphnia pulex Dappu1|306275 8

5 Triops granarius BAG80976.1

Daphnia pulex Dappu1|93844 3 Nauplius 7

Daphnia pulex Dappu1|93838 8

7

Daphnia pulex Dappu1|302464 5 Triops longicaudatus BAG80981.1 2 FPKM

Limulus polyphemus ACO05013.1 5 Limulus polyphemus ACO05013.1(2) Zoea e

Dmel30795 Dmel12680 Mysis 0 Dmel25039 Dmel25041 Dmel25038 LVAN21859Dmel27163 LVAN02392 LVAN10232 LVAN02799 Dmel1959 Dmel34167 Dmel1960 Dmel26577 LVAN15282 Dpul18536 Dmel12174 Dmel12175 LVAN12635 LVAN22882 Dmel30962 Dmel5966 phaw027487 LVAN10774 phaw020448 LVAN06344 LVAN03534 LVAN06649 LVAN22878 LVAN06252 Dpul202 LVAN22888 LVAN06253 Dpul18958 phaw021281 phaw018972Dmel30959 Dmel30886 Dmel30887 83 Dmel30975 76 94 Dmel30976

100 Dmel30974 Dpul15082 100 Dmel22237 LVAN23534Dpul169 Dmel22238 Dmel25149 74 93 Dmel21412 100 Dpul18602 phaw026878 Postlarvae Dmel22814Dpul6200 95 phaw024398 −5 100 100 92 Dmel34408 Dmel22813 LVAN21102 100 Dmel34407 94 Dmel34409 phaw023636 100 Dmel34531 LVAN03914 LVAN16394 LVAN23136 LVAN16395 LVAN19579 LVAN19578 LVAN16992 LVAN19016 LVAN11245 LVAN16963 LVAN16968 LVAN16991 LVAN16987 LVAN19564 LVAN19546 LVAN19568 LVAN07187 LVAN19580 phaw010974 96 Dmel34532 LVAN07986 Dmel34533 100100 phaw013979 LVAN18868 100 10090 Dpul14117 LVAN07511 100 Dpul14114 70 Dmel7672 LVAN07512 65 100 100 Dmel7673 LVAN01608 80 Dmel6298 LVAN13509 100 100 phaw024075 LVAN16820 87 Dmel2382 phaw027851 66 67 LVAN15645 GluR LVAN11223 97 phaw019170 94 92 Dmel7364 LVAN19870 Dmel7365 phaw027039 Dmel7367 LVAN14964 99 100100 Dmel7366 LVAN12493 10099 LVAN18423 kainate LVAN18141 88 phaw019047 phaw025134 Dpul4266 89 phaw008154 phaw024693 100 phaw026123 LVAN04550 LVAN15644 LVAN21622 LVAN12386 LVAN03461 95 85 7468 LVAN15696 LVAN03460 62 72 90 phaw009436 LVAN07247 LVAN07573 phaw001653 LVAN05382 phaw009324 LVAN05383 LVAN22887 LVAN23423 97100 LVAN04008 Dmel17456 LVAN00166 LVAN01709 LVAN19865 LVAN19196 LVAN19215 Expression level: log phaw001889 LVAN21103 Dmel409 f phaw005838 Dmel413 LVAN22883 phaw016667 100 Dmel407 8 LVAN12361 100 Dmel412 phaw005566 100 Dpul1335 LVAN00786 100 Dpul12407 LVAN22882 LVAN00782 100 LVAN02393 100 phaw025152 LVAN01218 LVAN18422 LVAN02872 61 phaw028489 LVAN15282 LVAN00899 100 phaw013054 6 LVAN03390 phaw015908 phaw017891 LVAN07575 LVAN18925 10099 phaw002268 LVAN06648 LVAN12527 100 100 phaw028617 LVAN09356 phaw003681 LVAN11440 LVAN03879 phaw026879 LVAN11636 100 LVAN13094 LVAN06649 4 LVAN03880 100100 phaw006259 LVAN03881 100 LVAN06648 LVAN02707 84 LVAN11922 LVAN17241 LVAN06345 LVAN01709 LVAN13198 LVAN07463 100 phaw002291 LVAN01864 phaw026934 2 LVAN04010 phaw017924 LVAN22888 LVAN10537 97 LVAN08103 LVAN23106 100 74 phaw016940 LVAN02474 phaw002658 LVAN07574 LVAN25260 phaw016516 LVAN18141 LVAN14963 100 LVAN13866 LVAN16048 100 LVAN13865 2 LVAN16049 phaw016693 0 LVAN18616 LVAN24667 LVAN22878 (FPKM+0.01) LVAN18617 phaw013242 phaw015804 LVAN00164 100 phaw019539 LVAN17381 10081 phaw005229 LVAN00165 LVAN04009 LVAN09357 LVAN17382 100 LVAN07442 LVAN12943 phaw013048 phaw007421 −2 phaw016514 phaw026880 LVAN21823 90 87 66 60 LVAN21048 LVAN13688 100 71 LVAN21814 LVAN15615 GluR LVAN09357 99 10097 phaw009260 phaw011654 phaw025664 100 67 LVAN08104 LVAN06826 phaw017323 LVAN06827 99 96 81 Dmel12493 LVAN06825 99 Dmel12495 LVAN12834 60 LVAN22883 delta LVAN06824 94 96 LVAN19115 −4 LVAN10233 phaw014976 LVAN21051 100 phaw016258 66 LVAN00514 92 81 82 phaw016635 100 99 LVAN03046 LVAN21115 86 phaw004779 LVAN11922 78 LVAN15406 LVAN10044 LVAN06664 LVAN10042 95 phaw026294 100 8598 100 LVAN16047 phaw002349 100 LVAN16045 phaw008218 87 LVAN16044 72 82 phaw005232 LVAN05464 phaw003926 LVAN03833 phaw005552phaw007786 LVAN12527 82 LVAN04021 LVAN06833 −6 100 LVAN15035 LVAN20502 phaw020677 LVAN02783 LVAN09134LVAN03915 LVAN16724 LVAN01259 LVAN05003 LVAN21725 LVAN14325 LVAN12630LVAN01260 LVAN05370 phaw002253 LVAN03845 LVAN05559 phaw018438 LVAN20721 LVAN05558 LVAN12023 LVAN00789 LVAN10371 LVAN03767 LVAN11175 LVAN11176 LVAN21221 LVAN15615 LVAN16271 LVAN19013 phaw023267 phaw004527 LVAN08292 phaw009276LVAN09033 LVAN23561 LVAN18698 LVAN22520 phaw007926LVAN05278 LVAN14785LVAN12834

LVAN10371 LVAN08765

LVAN24668 LVAN07172 LVAN22200

LVAN08764 LVAN01372 LVAN18670 Hc Ant Hp St In Ms Ok Gi Es Br Tg Vn Epi Ht

phaw011663 GluR delta GluR kainate Tissues

Fig. 3 Opsin and iGluR gene family in the shrimp genome. a Picture of the L. vannamei eyes and ommatidia. b Opsin genes of L. vannamei in comparison with those in the genomes of various arthropods. The number of visual and nonvisual type opsin genes, including pteropsin (Ptero), RGR-like (RGR), arthropsin (Arth), Rh7-like (Rh7) were identified using sequence alignment with known opsins and GPCR-domain searches (Supplementary Fig. 18). c Phylogenetic tree of the opsin gene family in arthropods. Six clades of opsins in the L. vannamei genome were observed (red). The genes in largest clade are specifically expanded opsins in the L. vannamei genome, which are also tandemly duplicated. The arrow indicates the transcriptional orientation. d Expression of opsin genes (FPKM>1) during different larval stages of L. vannamei. e Expansion of ionotropic glutamate receptor (iGluR) genes in L. vannamei. f Expression of iGluR genes in different tissues: hemocyte (Hc), antennal gland (Ant), hepatopancreas (Hp), stomach (St), intestine (In), muscle (Ms), lymphoid organ (Ok), gill (Gi), eyestalk (Es), brain (Br), thoracic ganglion (Tg), ventral ganglion (Vn), epidermis (Epi), heart (Ht). GPCR G protein-coupled receptor

Gene tandem duplication was one of the most intriguing myosins, actins and heat shock proteins, which may be important features of the L. vannamei genome. A total of 4662 genes were for adaptive evolution of the shrimp. identified to be tandemly duplicated (Supplementary Fig. 14), and these duplicates share high sequence similarities (identity >98%), indicating recent duplications. Moreover, these genes mainly Visual system for benthic adaptation. L. vannamei has a pair of consist of opsins, crustacyanins, chitinases, cuticle proteins, large, stalked eyes with an ovoid visual surface making up of a

NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4 highly elevated number of ommatidial lenses1 (Fig. 3a). Each photoreceptors capable of capturing photons most common in compound eye consists of about 55,000–80,000 ommatidia20, this environment21. The genome sequencing of L. vannamei now rivaling the most acute eyes in arthropods. Adult and juvenile L. allows us, for the first time, to infer genomic-level evolution in vannamei are benthos in turbid and shallow waters to a depth of this delicate eye1. We identified 42 opsin genes, which are the key 50 m, where no light with a wavelength <500 nm penetrates. light-sensing factors responsible for visual signal transduction Therefore, L. vannamei are expected to have developed (Fig. 3b). All these opsins belong to the visual type related to color

a c 3 3 1 3 65 3 2 1 0 1 6 8 4 5 1 CHH >100 2 1 0 49 52 2 7 6 6 4 21 11 9 22 4 Br−C

59 52 47 115 190 108 38 37 47 66 82 95 59 164 91 CBP 50 4 9 5 6 18 11 5 5 6 5 7 6 4 15 8 Chitinase

0 0 0 1 24 0 0 0 0 0 0 0 0 0 0 CaCP 0 Smar Turt Isca Phaw Lvan Dpul Apis Phum Znev Lmig Amel Tcas Bmor Dmel Agam Gene number

ep ex en d 2 ed Br-C 0 ba 8 –2 CD0D1D2D3D4P1P2 6 4 –4 CHH b 2 0.4

CHH(MIH) 0 0.2 Expression level: Log −2 −4 0 −6 Ecdysone –0.2 synthesis 4

2 CBP 2 Expression level: Log (FPKM(i)/FPKM(C)) EcR/RXR 0

–1 1 Ecdysone response genes 0 (Br-C, E75, HR3, Ftz-F1...) –1 2 (FPKM+0.01) Chitinase –2 Apoptosis CBP, CaCP, Chitinase, CP 5

0

Molting and epidermis reconstruction CaCP –5 C D0 D1 D2 D3 D4 P1 P2

C D0 D1 D2 D3 D4 P1 P2 –10

e Type III CHH Type II CHH Type I CHH (ITP) (GIH/MIH/MOIH/VIH)

Type Ic CHH Type Ib CHH Type Ia CHH Tip label: L. vannamei genome scaffold: Penaeus Scaffold 290 Scaffold 2780 Macrobrachium Scaffold 1036 Scaffold 2916 Hommarus Scaffold 1179 Scaffold 2938 Eriocheir Scaffold 1388 Scaffold 3152 Discoplax Scaffold 1399 Scaffold 3427 Daphnia Scaffold 3666 Drosophilla Scaffold 2306 Scaffold 2490 Scaffold 4017

Penaeus_monodon-CHH1-like_gene-MF974242 Scaffold 2640 Scaffold 4164 Scaffold 2739 Scaffold 4572 Penaeus_chinensis-CHH2-like_protein-JQ012759 LVANscaffold_4017_1 LVANscaffold_2780 LVANscaffold_1179_1 LVANscaffold_4572_2 LVANscaffold_4164 LVANscaffold_1179_2 LVANscaffold_4572_1 LVANscaffold_290_1 LVANscaffold_290_2 LVANscaffold_2306 LVANscaffold_1399 LVANscaffold_3666 LVANscaffold_4017_2 Penaeus_chinensis-CHH1-like_protein-JQ012758 Penaeus_vannamei-CHH_gene-KJ660843 Penaeus_vannamei-CHH_mRNA-KU174947 Penaeus_vannamei-CHH_mRNA-KJ660842 LVANscaffold_3427 LVANscaffold_2938_1 LVANscaffold_2938_4 LVANscaffold_2938_3 LVANscaffold_2938_2 Penaeus_monodon-CHH-MG888660 Drosophila_melanogaster-ITP_transcript-NM_001043104 Drosophila_melanogaster-ITP_transcript-NM_138087 Drosophila_melanogaster-Ion_transport_peptide_ITP_transcript-NM_001299916_2 Drosophila_melanogaster-ITP_gene_partial_cds.-KX797484 Drosophila_melanogaster-GEO11329p1_mRNA_partial_cds.-KX531271 Drosophila_melanogaster-ITP_transcript-NM_001169823 Drosophila_melanogaster-Ion_transport_peptide_ITP_transcript-NM_001299916 Drosophila_melanogaster-SD05282_full_length_cDNA.-AY060468 Drosophila_melanogaster-ITP_gene_complete_cds.-KX805623 Daphnia_magna-putative_ion_transport_peptide-like_precursor-EF178504 Drosophila_melanogaster-ITP_transcript-NM_001169822 Drosophila_melanogaster-ITP_mRNA_partial-EU290748 Penaeus_vannamei-MIH-like_neuropeptide-S73824 Daphnia_magna-putative_ion_transport_peptide_precursor-EF178503 Penaeus_vannamei-sinus_gland_peptide_A-LC278950 Penaeus_vannamei-sinus_gland_peptide_A_precursor-LC318276 LVANscaffold_1388_2 Penaeus_vannamei-sinus_gland_peptide_A-LC278951 Macrobrachium_rosenbergii-CHH-H_mRNA-AF317554 Macrobrachium_rosenbergii-crustacean_hyperglycemic_hormone_gill-AF372657_2 LVANscaffold_2739 LVANscaffold_3152 Penaeus_japonicus-CHH_mRNA-AB247560 Penaeus_vannamei-ion_transport_peptide-EF156402 Penaeus_vannamei-hyperglycemic_hormone-like_peptide_2-JX856152 Penaeus_vannamei-MIH_gene_partial-AF387485 LVANscaffold_2640_4 Penaeus_vannamei-hyperglycemic_hormone-like_peptide_precursor-JX856151 Penaeus_monodon-CHH1-AF233295 Penaeus_vannamei-sinus_gland_peptide_C_precursor-LC318431 Penaeus_monodon-hyperglycemic_hormone-like_peptide_precursor-MG888661 Penaeus_vannamei-hyperglycemic_hormone-X99731 LVANscaffold_1388_1 LVANscaffold_2490_22 Penaeus_monodon-hyperglycemic_hormone_homolog_precursor-AF104386 LVANscaffold_2490_4 LVANscaffold_2490_5 Penaeus_japonicus-Pej-SGP-I_complete_cds.-AB007507 LVANscaffold_2490_7 LVANscaffold_2490_10 Penaeus_vannamei-sinus_gland_peptide_B-LC278952 Penaeus_vannamei-sinus_gland_peptide_B_precursor-LC318277 LVANscaffold_2490_11 LVANscaffold_2490_12 LVANscaffold_2490_15 Penaeus_vannamei-sinus_gland_peptide_B_precursor-LC318277 LVANscaffold_2490_6 LVANscaffold_2490_1 LVANscaffold_2490_16 LVANscaffold_2490_3 LVANscaffold_2490_17 Penaeus_vannamei-sinus_gland_peptide_B_precursor-LC278952 LVANscaffold_2490_23 LVANscaffold_2490_2 LVANscaffold_2490_20 Penaeus_japonicus-hyperglycemic_hormone-AB035724 Penaeus_japonicus-Pej-SGP-III_complete_cds.-D87864 Penaeus_monodon-CHH_mRNA_partial_cds.-AF104930 LVANscaffold_1036_6 Penaeus_vannamei-sinus_gland_peptide_F-LC278953 Penaeus_vannamei-sinus_gland_peptide_F_precursor-LC318278 Penaeus_monodon-hyperglycemic_hormone_homolog_precursor-AF104389 Penaeus_japonicus-Pej-SGP-V_complete_cds.-AB007508 Eriocheir_sinensis-crustacean_hyperglycemic_hormone_mRNA-JX485644 Homarus_americanus-crustacean_hyperglycemic_hormone_precursor_isoform_A_Homarus-S76846 Macrobrachium_nipponense-hyperglycemic_hormone_gene_complete_cds.-KM657821 Macrobrachium_nipponense-prepro-crustacean_hyperglycemic_hormone-HQ724327 Macrobrachium_lanchesteri-crustacean_hyperglycemic_hormone_CHH-AF088854 Penaeus_monodon-CHH2_mRNA_partial_cds.-AF104931 Macrobrachium_rosenbergii-partial_mRNA_for_Crustacean_Hyperglycemic-AJ130968 Macrobrachium_rosenbergii-crustacean_hyperglycemic_hormone_gill-AF372657 Macrobrachium_rosenbergii-crustacean_hyperglycemic_hormone_mRNA-AF219382 Penaeus_monodon-neuropeptide_Pem-SGP-C2-AB054989 Penaeus_monodon-neuropeptide_Pem-SGP-C1-AB054988 Penaeus_monodon-neuropeptide_Pem-SGP-C2_precursor-AB054188 Penaeus_monodon-PmSGP-III_precursor-AF104388 Penaeus_monodon-neuropeptide_Pem-SGP-C1_precursor-AB054187 Penaeus_monodon-PmSGP-II_precursor-AF104387 LVANscaffold_1036_4 Penaeus_monodon-PmSGP-V_precursor-AF104390 LVANscaffold_1036_5 LVANscaffold_2640_2 Penaeus_vannamei-crustacean_hyperglycemic-AB744717 LVANscaffold_2640_1 LVANscaffold_2916 Penaeus_vannamei-sinus_gland_peptide_G_precursor-LC318279 LVANscaffold_2640_3 LVANscaffold_1036_3 Penaeus_vannamei-GIH_mRNA_complete-KF879913 Penaeus_vannamei-VIH_mRNA-KC962398 Macrobrachium_rosenbergii-sinus_gland_peptide_B_precursor_mRNA-AF432347 LVANscaffold_1036_1 LVANscaffold_1036_2 Penaeus_monodon-GIH_mRNA_complete_cds.-KF500066 Penaeus_monodon-gonad_inhibiting_hormone_precursor-JN836930 Penaeus_monodon-gonad_inhibiting_hormone-KT906363 Eriocheir_sinensis-putative_MIH_mRNA-AF312975 Penaeus_semisulcatus-MIH_mRNA_complete-KF031444 Penaeus_semisulcatus-MIH_gene_complete-AB810255 Penaeus_chinensis-putative_MIH-AF312977 Penaeus_semisulcatus-GIH_mRNA-LC177258 Penaeus_indicus-GIH_mRNA_partial-KM241870 Penaeus_stylirostris-putative_MIH-AF312976 Penaeus_indicus-preprogonad_inhibiting_hormone_mRNA-KT373905 Macrobrachium_nipponense-gonad-inhibiting_hormone_mRNA_complete-HQ724326 26 Macrobrachium_nipponense-gonad_inhibiting_hormone_gene_complete-KM657819 Penaeus_japonicus-Pej-SGP-IV_complete_cds.-AB004652 Discoplax_celeste_moult_inhibiting_hormone_XO_preproprotein_AEM45616 Macrobrachium_malcolmsonii-gonad_inhibiting_hormone_mRNA-KY412853 Homarus_americanus-prepro-gonad-inhibiting_hormone.-X87192 Macrobrachium_rosenbergii-sinus_gland_peptide_A_precursor_mRNA-AF432346 Macrobrachium_nipponense-molt_inhibiting_hormone_gene_complete-KM657820 Macrobrachium_nipponense-molt_inhibiting_hormone_mRNA_complete-KF878973 Macrobrachium_nipponense-molt_inhibiting_hormone-like_protein_mRNA-HQ724325 Bootstrap Macrobrachium_rosenbergii-molt_inhibiting_hormone-like_protein-KC990939 support Penaeus_vannamei-molting-inhibiting_hormone-like_protein-MF358695 100 0.3

6 NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4 ARTICLE sensation. To our knowledge, L. vannamei has the most visual Unlike vertebrates and most protostomes that use acetylcholine opsin genes among animals, including creatures generally regar- as a neurotransmitter, arthropods use glutamate at the neuro- ded as having excellent vision such as mantis shrimps (6–33 muscular junction29. A total of 169 ionotropic glutamate opsins) and dragonflies (20 opsins)22. The 42 visual-opsins can be receptors (iGluRs) and 148 glycine receptors (GlyRs) genes are divided into three subfamilies, 33 in the long-wavelength type present in the L. vannamei genome (Fig. 3e, Supplementary (LW) subfamily, seven in the middle-wavelength type (MW) Fig. 16, Supplementary Table 18). iGluRs are involved in subfamily and two in the short-wavelength type (SW) subfamily excitatory neurotransmission, which can occur quickly and last (Fig. 3c). Intriguingly, most of the LW subfamily genes belong to for a long time. They also play a role in regulating myelination30. two clusters in the genome (Fig. 3c), suggesting a major expan- GlyRs can activate Cl− conductance in neurons31. Furthermore, sion via tandem duplication. iGluRs and GlyRs genes are mainly expressed in the nervous The expansion of LW opsins sensitive to light at the 560 nm system as well as in muscle (Fig. 3f, Supplementary Fig. 17). In wavelength facilitates feeding of the shrimp in benthic environ- addition, some gene families involved in muscle contraction, such ment with predominantly yellow and red (long-wavelength) as actins and myosins, are also significantly expanded in the L. light21. Expansion of opsin genes was also observed in the vannamei genome (Supplementary Table 16). These features genome of the water flea D. pulex (36 opsins), but this expansion might equip the shrimp to rapidly perform neural signal is restricted to the MW subfamily, which is sensitive to light at transduction, enhancing its responsiveness and strengthening its ~530 nm wavelength associated with planktonic habitat (Supple- locomotion, including escape reactions in adaptation to its mentary Fig. 15). Considering the shift from planktonic to benthic swimming life in shallow seawater. benthic habitat during the shrimp’s life cycle, we analyzed the expression of opsin genes at different larval stages of L. vannamei. fi fi Intensi ed ecdysone signal pathway. Molting is a basic phy- Interestingly, LW-type opsin genes showed signi cantly higher siological process of crustaceans (Fig. 4a), through which they expression levels in the benthic postlarvae, whereas the MW realize metamorphosis, growth, and development. An ecdysone opsin genes displayed higher expression in the planktonic larvae signal pathway, under hormonal regulation, controls every step in (Fig. 3d). These data provide strong support for the correlation of molting (Fig. 4b). Both an initial rise and a coordinated decline of differential expression among the types of visual opsin genes with circulating ecdysone concentration are necessary for successful the planktonic-benthic habitats, suggesting that L. vannamei has molting32.InL. vannamei, the immediate function of ecdysone in a highly effective visual system adapted to its habitat transition. promoting molting is amplified by expanded downstream genes in the ecdysone signal pathway. After entering the nucleus, Nerve signal conduction and locomotion. In the central nervous ecdysone activates successive transcription of early stage ecdysone system of decapods, the giant nerve fibers originating from the response genes (Fig. 4b). The major ecdysone response gene, brain and terminating at the caudal ganglion are responsible for Broad-Complex (Br-C), was perceptibly expanded in the genome the rapid escape reactions23. These nerve fibers of penaeid shrimp (Fig. 4c), and were highly expressed at early premolt (D1) and have a conduction speed of ~200 ms–1, generally regarded as the postmolt (P1) (Fig. 4d), consistent with their function in reg- fastest conducting speed in animals24, compared with a mere 20 ulating the expression of downstream effectors, such as chitinase ms−1 for squids, which have the largest nervous system among and cuticular proteins33. Cuticle proteins, chitin binding proteins invertebrates25,26. In the L. vannamei genome, the gene categories (CBPs) and calcification-related cuticular proteins (CaCPs) are for signal recognition and neural development are significantly main structural proteins for constructing the cuticle34, which enriched (Supplementary Data 1, Supplementary Tables 16). protects the shrimp from bacteria, fungi, viruses, and mechanical Marked expansion was found in several gene families, including injuries. However, to allow for growth in body size, the cuticle is 457 G protein-coupled receptors (GPCRs), 53 transient receptor degraded prior to ecdysis by chitinase34. Genes encoding CBPs, potential channels, 21 innexins, and 47 protocadherins (Supple- CaCPs, and chitinase are significantly expanded in the L. van- mentary Table 17), which are considered as the core neural- namei genome (Fig. 4c, Supplementary Table 19). These genes development-related genes across bilaterians26. The rapid nerve were mainly expressed in the epidermis and intestinal peritrophic conduction in shrimps is based on their heavy myelin sheath27, matrices (Supplementary Fig. 19). The structural proteins were but the myelin substrate, cholesterol, cannot be synthesized in coordinately expressed during molting, which began with the most invertebrates25,28. Crustacyanins, the apolipoprotein family initial degradation by chitinase, and then the synthesis of new genes which transport cholesterol, are significantly expanded and cuticle with high expression of Rebers and Riddiford 2 (R&R2)- tandemly duplicated in the L. vannamei genome, and most of type cuticle protein and CBPs genes from early premolt (D2) to them are highly expressed in the neural and digestive system late premolt (D4). After ecdysis (P1–P2), the CaCPs and R&R1- (Supplementary Data 3). type cuticle proteins were produced to harden the cuticle (Fig. 4d,

Fig. 4 Key gene families related to the ecdysone signal pathway in the L. vannamei genome. a Picture of shrimp and its shed exoskeleton. Schematic diagrams below the picture show the changes of the epidermis (ep: epicuticle, ex: exocuticle, en: endocuticle, ed: epidermis, ba: basement membrane) during the molting cycle. The molting cycle can be divided into eight stages: Intermolt (C) stage, Premolt stages (D0–D4), and Postmolt stages (P1–P2). b The shrimp ecdysone signal pathway. Expanded gene families are highlighted in red. c Distribution pattern of the expanded gene families in shrimp ecdysone signal pathway and comparison with other arthropods. d Expression patterns of the expanded gene family genes at the different molting stages of L. vannamei. The overall expression trends in different molting stages are shown in the right. e Phylogenetic analysis of the CHH family based on CDS sequences containing full CHH domain, reconstructed using IQ-tree 1.6.2 under TVMe + I + G4 model with ultrafast bootstrap method. Sequences from L. vannamei genome are annotated with circles of various colors to indicate their corresponding scaffold of origin. GenBank sequences of the CHH family from genus Penaeus s.l., Macrobrachium, Homarus, as well as from species Eriocheir sinensis, Discoplax celeste, Daphnia magna and Drosophila melanogaster were incorporated in this analysis. These sequences were annotated in the phylogenetic tree as follows: species name/gene description/accession number. They were classified into type I, II, and III CHH subfamilies, in which type III CHH consisted of ion transport peptides (ITPs). CHH crustacean hyperglycemic hormone

NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4

Supplementary Fig. 19). The expanded genes encoding Br-C and with high expression levels in the eyestalk at intermolt and low cuticular proteins in the genome can enhance the functioning of levels at late premolt (Fig. 5a). SREBP is a major regulator of ecdysone, once its concentration reaches a certain threshold. The cholesterol metabolism, and its function is inhibited by a high enhanced ecdysone function might be the primary mechanism to level of cellular cholesterol41. Ecdysone is biosynthesized with assure the frequent molting behavior of shrimps. cholesterol as the substrate, and cholesterol cannot be synthesized In L. vannamei, a striking expansion of the crustacean in vivo in crustaceans28. The ecdysone level46 exhibits a similar hyperglycemic hormone (CHH) family underpins the control of trend to the in vivo cholesterol level during molting53, and both the enhanced ecdysone function (Fig. 4c). Ecdysone synthesis in show opposite trends to expression patterns of MIHs and SREBP. the crustacean Y-organ is mainly under the negative control of These observations suggest that a high cholesterol level may drive the molt-inhibiting hormone (MIH), one of the type II peptides molting through SREBP, which may positively regulate MIH of the CHH family35. We identified seven tandemly duplicated expression, thus suppressing ecdysone synthesis and molting. MIH genes (Fig. 4e) with high sequence similarity (Supplemen- This hypothesis was supported by a reduction (~95%) in tary Fig. 20). This might facilitate the coexpression and expression of all detected MIH genes after SREBP silencing in the cofunctionality of these genes during molting. MIH transcripts eyestalk (Fig. 5b, Supplementary Fig. 24). Injection of cholesterol were mainly located in the eyestalk and ventral nerve (Supple- into shrimps had a similar inhibitory function on MIH expres- mentary Data 4), and their expression was low at late premolt and sion, reducing MIH expression levels by >90% (Fig. 5b). This high from postmolt to early premolt (Supplementary Data 4), the inhibition was possibly exerted through inhibiting nuclear exact reverse of the hemolymph ecdysone level36. Besides, type I translocation of SREBP42. MIH silencing (Supplementary Fig. 24) peptides of the CHH family are also known to suppress significantly accelerated molting of shrimps, making them ecdysteroidogenesis in the Y-organ, though at a higher dose than develop to the late premolt (D3, Fig. 5c, i). However, most control MIH37. In the L. vannamei genome, we identified three major shrimps at intermolt progressed to early premolt (D1) within the clades of type I CHH peptides: (1) type Ia CHH clade, which same duration (Fig. 5c, ii). By contrast, injection of Lipitor, a contains peptides more highly expressed in gut, thoracic ganglion lipid-lowering agent, apparently retarded molting at the and lymphoid organ than in eyestalk and other tissues D3 stage (Fig. 5c, iii), as control shrimps progressed to postmolt (Supplementary Data 4), and may function in ion/osmoregulation (Fig. 5c, iv). These data indicated that SREBP silencing and a high and water uptake but lack hyperglycemic functions, (2) type Ib cholesterol level had similar functions as MIH silencing, sup- clade, which is highly expressed in eyestalk as well as brain and porting the hypothesis that SREBP mediates MIH-regulated thoracic ganglion, and is less well-studied but also believed to ecdysteroidogenesis. assume an ion/osmoregulatory function38, and (3) type Ic clade, The photoperiod has evident effects on crustacean molting43. which is penaeid-specific and contains peptides involved in the Due to the key function of opsin in visual transduction, we regulation of molting, reproduction, energetics and ionic speculate that opsin might be the primary mediator between metabolism39 (Fig. 4e). Significantly, we detected much more photoperiod and molting. This is supported by the upregulated striking expansion in the type Ic CHH than in other type I CHH levels of MIH genes after silencing of several LW opsin genes genes in the genome (Fig. 4e, Supplementary Data 5). The type Ic (Fig. 5b, Supplementary Fig. 25), indicating an inhibitory role of CHH genes showed similar expression patterns as MIH genes in opsin in MIH expression. In addition, after 10 days of opsin different tissues and molting stages, while the other type I CHH silencing, shrimps exhibited a lower molting rate than the control did not (Fig. 4d, Supplementary Data 4). Most type Ic CHH genes (Fig. 5d). are tandemly located in Scaffold 2490 (Fig. 4e) and share high In sum, we infer that cholesterol supply and photoperiod sequence similarity (Supplementary Fig. 21). Similar gene regulate molting via SREBP and opsin, respectively (Fig. 5e). In organization and expression patterns between the type Ic CHH this model, a low cholesterol level corresponds to increased genes and MIH genes suggest that they might play similar roles in SREBP and MIH expression at postmolt. Cholesterol uptake leads molting. Furthermore, MIH and CHH also increase the to inactivation of SREBP and consequent reduction in MIH intracellular concentration of cyclic nucleotides (cAMP or cGMP) expression at early premolt. Simultaneously, ecdysone also rises to and Ca2+ in the Y-organ, which further inhibits ecdysteroidogen- initiate the downstream signaling pathways. The model also esis40 by activating the “triggering” phase of MIH regulation50. proposes the function of opsin in mediating the effect of The genes involved in the “triggering” phase of MIH regulation, photoperiod through modulating the MIH-regulated molting. including cAMP-dependent protein kinase, phosphodiesterase 1, calmodulin and Ca2+/calmodulin-dependent protein kinase genes, were all under positive selection (Supplementary Fig. 22, Immune protection during molting. The newly formed cuticle Supplementary Data 6). The expansion and tandem duplication of shrimps at postmolt is thin and not yet hardened, making them of CHH family genes, and positive selection of MIH “triggering” prone to pathogen infection. We anticipate that the shrimp may phase genes might strengthen the negative control of ecdysone possess an intensified immune protection during ecdysis. Crus- synthesis. Collectively, this intensified ecdysone signal pathway, taceans mainly rely on their innate immune system, consisting of generated by gene expansion, tandem duplication and positive humoral immunity and cellular immunity, to defend against selection, with amplification of ecdysone function and negative pathogens44. We found 111 immune-related genes with differ- control on ecdysteroidogenesis, serves to ensure the frequent and ential expression patterns at different molting stages. These genes precise molting process in the shrimp. were clustered into six groups according to their functions, including 15 antimicrobial peptides (AMPs), 59 pattern recog- nition receptors (PRRs), six phenoloxidase-system-related genes, Key genes mediating environmental factors in molting reg- 14 apoptosis-related genes and 17 homeostasis-related genes ulation. Molting is influenced by many environmental factors, (Supplementary Data 7). such as nutrition and photoperiod. MIH genes of L. vannamei Crustins contributed to 13 of the 15 differentially expressed universally contain the sterol regulatory elements (SRE), i.e. the AMPs (Supplementary Data 7). Most of them showed the binding sites for SRE-binding protein (SREBP), in the regulatory lowest expression levels during late premolt and increased sequence (Supplementary Fig. 23). Furthermore, all detected expression levels immediately after ecdysis (Supplementary transcripts of MIHs and SREBP had similar expression patterns, Fig. 26A). Crustins were mainly distributed in the epidermis,

8 NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4 ARTICLE

a 4 d Intermolt 30 d Early premolt dsEGFP injection b 3 Late premolt Opsin RNAi Postmolt 20

2 c a a a 10 1 c b a Relative expression level b a

0 Accumulative molting rate (%) 0 MIHS1036-3 MIHS1036-4 SREBP 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Days postinjection (d) b 8 dsEGFP injection c Alcohol injection e Cholesterol accumulation Photoperiod 6 SREBP RNAi c c Cholesterol injection 4 Opsin RNAi

2 Opsin bb bb bb SREBP Relative expression level a aaaaa 0 MIHS1036-3 MIHS1036-6 MIHC1028

Eyestalk MIH/CHH c MIH silenced Lipitor injection neuroendocrine

i iii MIH/CHH receptors Treatment

Intracellular

ii iv

Control Ecdysone synthesis

Fig. 5 Regulation of MIH-mediated ecdysone synthesis in shrimp. a The expression patterns of SREBP and MIHs in shrimp eyestalks at different molting stages. b The expression profiles of MIHs in the eyestalk with different treatments. Injections of dsEGFP and PBS serve as controls. c Phenotype changes of the shrimp uropods cuticle after treatments. Shrimps at the intermolt stage were treated by RNAi: (i) 2 days after injection of dsMIH; (ii) 2 days after injection of dsEGFP. Shrimps at the early premolt stage were used for Lipitor treatment: (iii) 2 days after injection of Lipitor; (iv) 2 days after injection of PBS. Yellow arrows indicate the space between old and new cuticles at different molting stages. Bar = 50 μm. d Effect of opsin silencing on accumulative molting rates, with the injection of dsEGFP serving as a control. The green arrow shows the time when the treatment led to a significantly lower accumulated ecdysis rate compared to the control group. e The schematic diagram of the regulation of SREBP and opsin on the MIH-mediated ecdysone synthesis. SREBP SRE-binding protein stomach, gill, and hemocytes (Supplementary Fig. 26B). Of the patterns of apoptosis-related genes indicated that they might be 59 identified PRRs, 53 are lectins, important PRRs involved in involved in apolysis rather than immunity. the innate immunity of crustaceans45 (Supplementary Data 7). In summary, AMPs (crustins, anti-lipopolysaccharide factor Several lectin genes were highly expressed at P1 stage and penaeidin), PRRs (lectins and LGBP) and phenoloxidase (Supplementary Fig. 26C), mainly detected in the epidermis, system genes largely contribute to the immune protection during stomach and gill (Supplementary Fig. 26D, 27). Most other ecdysis (Supplementary Fig. 26H). While these immune-related lectins were highly expressed in the hepatopancreas, which genes are important in protecting the host against pathogens in exhibited the lowest expression levels at the P1 stage and was the external environment, their high expression in the hepato- highly expressed from P2 to early premolt (Supplementary pancreas might be related to the in vivo gut flora stability of the Fig. 28). Expressions of the phenoloxidase-system-related genes shrimps and protect the host against pathogens in internal apparently increased at postmolt (Supplementary Data 7) and environment as bacteria, including some pathogens are ubiqui- were mainly distributed in the gill, epidermis, stomach, tous in the hepatopancreas. intestine, hemocytes, and hepatopancreas (Supplementary Fig. 26E, F, G). Recently, a phenoloxidase gene in insects was reported to play a role in skin immunity, protecting the host Impacts of breeding. L. vannamei became an important aqua- against fungal infection after ecdysis46.Mostoftheidentified culture species in the 1980s, and a selective breeding program apoptosis-related genes, mainly expressed in the intestine and began under the “US Marine Shrimp Farming Program hepatopancreas, were highly expressed throughout the molting (USMSFP)” in 198947. Compared to the long history of domes- process, with two peaks at D2 and D4 (Supplementary Fig. 26E, tication in plants, terrestrial livestock and poultry48,49, the history F).Atpremolt,anewcuticleisgeneratedafterapolysis,when of L. vannamei breeding is relatively short. However, the selection the old cuticle is separated from the epidermis. The expression pressure in L. vannamei is very high, as the shrimp could be

NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4 selected one generation per year and a single spawning female can between crustaceans and other ecdysozoans, including insects. As produce >300,000 offspring50. To examine the genomic con- a vital process in shrimp, molting is tightly linked to growth, sequences of such intensive selective breeding, we selected 22 metamorphosis and reproduction. Elucidating the regulatory individuals for genome resequencing, including 8 individuals mechanisms of molting will provide useful clues for genetic from the wild and 14 individuals from five different broodstocks analysis of these important processes. (Supplementary Table 20). We sequenced them with an average Our genomic analyses have also revealed that almost 30 years of >23× genome coverage (Supplementary Table 21), and iden- of breeding has exerted a significant impact on the genome of the tified 31,993,474 single nucleotide polymorphisms (SNPs) among L. vannamei broodstocks. An increase in aquaculture production the shrimps (Supplementary Table 22), with an average of 19 and efficiency can be achieved through genetic improvement of SNPs/Kb. The SNPs density in the L. vannamei genome is much cultured stocks. The assembled shrimp genome and the large higher than that in chickens51 and pigs52, and similar to that in amount of SNP markers will provide a useful resource for the the oyster, which has extremely high level of heterozygosity6. application of genome-wide association studies and genomic Most SNPs are located at the intergeneic regions (86.68%), and selection, and thus accelerate genetic improvements in shrimp the coding regions display much lower genetic diversity than the culture. introns. In the coding regions, 261,056 synonymous and 206,026 fi nonsynonymous SNPs were identi ed (Supplementary Table 23). Methods Around 1400 SNPs are located at splice acceptor sites and 1452 Genome sequencing. The muscle of a single male adult of L. vannamei (Kehai SNPs at splicing donor sites. To our knowledge, this represents No.1 variety, Hainan, China) was used for DNA extraction and genome sequen- the largest set of high-quality SNPs obtained from L. vannamei, cing. Genomic DNA was prepared using TIANamp Marine Animal DNA Kits and it would constitute a valuable resource for genetic research (TIANGEN, Beijing, China). The DNA was sheared using a sonication device for paired-end (PE) library construction. Libraries with insert sizes ranging from and selection. 200 bp to 20 Kb were constructed according to the instructions provided in the Significant differences were observed between the genetic Illumina library preparation kit. All libraries were sequenced on HiSeq 2000 and diversity of wild shrimp and various broodstocks. The average HiSeq 2500 platform (Illumina, San Diego, USA). These raw reads were subse- nucleotide diversity (π) at the whole-genome level for the two quently trimmed for quality using Trimmomatic v.0.35 (http://www.usadellab.org/ −3 −3 cms/index.php?page=trimmomatic). Illumina sequence adaptors were removed, groups was 3.69 × 10 and 2.70 × 10 , respectively. Phyloge- low-quality bases from the starts or ends of raw reads were trimmed, and reads netic analyses based on the whole-genome SNPs separated the were scanned using a 4-bp sliding-window and trimmed when the average quality wild individuals from those in the broodstocks, which were then per base dropped below 15. The clean reads obtained from this process were used grouped into two clades, with the individuals in Ecuador for subsequent analysis. For PacBio library construction, the genomic DNA of L. vannamei was sheared clustered separately from the other commercial breeding lines, to ~20 Kb and the fragments below 7 Kb were filtered using BluePippin. Filtered apparently because the former only underwent four generations DNA fragments were then converted into the proprietary SMRTbell library using of selection (Fig. 6a). the PacBio DNA Template Preparation Kit. In total, ~133 Gb of quality-filtered θπ Comparison of the divergence index (FST) and the ratios data were obtained from the PacBio sequencing. π π ( wild/ breeding) between the wild and cultured populations allowed us to identify 14 regions (1.66 Mb in size) (Fig. 6b, c), Genome size estimation. We adapted a method called Jellyfish56, which is based containing 28 predicted genes (Supplementary Table 24), that are on k-mer distribution, to estimate the genome size with the high-quality reads from under strong selection. GO and KEGG enrichment analyses show the short-insert size libraries (500 bp). We obtained a k-mer depth distribution from the Jellyfish analysis and could clearly observe the peak depth from the that Rho protein signal transduction pathway and glycosphingo- distribution data. In order to obtain an estimation of the L. vannamei genome size, lipid biosynthesis pathway are enriched. Two copies of A-kinase the following formula was applied: genome size = total_kmer_num/kmer_depth, anchor protein 13 (AKAP13), involved in the Rho protein signal where total_kmer_num is the total number of k-mers from all reads, and transduction pathway, are strongly selected (Fig. 6c, Supplemen- kmer_depth is the peak depth. tary Fig. 29). AKAP13 is a scaffold protein with guanine exchange factor activity and plays a role in TLR2-mediated NF-κB Genome assembly. All of the subreads from the PacBio sequencing were activation, which is involved in innate immunity53. The selected assembled using the WTDBG software (https://github.com/ruanjue/wtdbg-1.2.8). gene in the glycosphingolipid biosynthesis pathway is annotated The assembled sequence was then polished using Quiver (SMRT Analysis v2.3.0) with default parameters. To ensure the high accuracy of the genome assembly, as lactosylceramide 4-alpha-galactosyltransferase (A4GALT) several rounds of iterative error correction were performed using the Illumina clean (Fig. 6c). A4GALT is involved in protein glycosylation and reads. protein modification process necessary for synthesis of the Finally, the de novo assembly of the PacBio read sequences was carried out receptor for bacterial verotoxins in mammals54. Further, one using continuous long reads following WTDBG. The paired-end reads of Illumina (200, 300, and 500 bp libraries) were mapped to polish assembly sequence from type of anti-lipopolysaccharide factor like protein (ALF) (Sup- Quiver by BWA (https://github.com/PacificBiosciences/GenomicConsensus). The 55 plementary Fig. 29) that is essential to immune defense was also SNPs and small insertions and deletions (INDELs) were called and corrected by found to be under strong selection. These data illustrate that SAMTools and an in-house script (“SNP_correction.pl”, deposited in https:// artificial selective breeding under culture conditions influences github.com/jianbone/L_vannamei_genome). Finally, we generated scaffolds and performed gap-filling with SSPACE 3.0 with parameter values of “-x 1 -m 50 -o 10 the disease resistance of shrimp. -z 200 -p 1” using meta-paired sequencing data (Illumina 5, 10, and 20 Kb libraries). Discussion Quality assessment of genome assembly. To evaluate the quality of the genome The high-quality genome assembly of L. vannamei provides the assembly, we mapped partial reads from the short-insert size library back to the opportunity to understand various biological processes of shrimp scaffolds using Bowtie2 with the following parameters: --rdg 3,1 --rfg 3,1 --gbar 2 57. at the genome level. The most prominent characteristic of the A total of 93% of the reads could be mapped back to the current assembly. To coding region of the shrimp genome is the expansion of a series of evaluate the completeness of the genome assembly in genetic regions, we used conserved core genes by running software CEGMA on the assembly of the L. genes related to visual system, nerve signal conduction, and vannamei genome58. molting. These gene expansions might have provided the shrimp with its excellent eyesight and rapid nerve signal conduction, Repeat annotation. Both RepeatModeler and RepeatMasker were used to perform better equipping it to adapt to its benthic habitat. Our genome de novo identification and mask of repeats59. To ensure the integrity of genes in the assembly also sheds light on the molting process and provides subsequent analysis, low complexity or simple repeats were not masked in this important evidences for exploring the similarities and differences analysis, because some of them could be found within the genes.

10 NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4 ARTICLE

a c S1 S2 A-kinase anchor protein 13–1

Z2 2 Wild 1.5 Breeding Z1 Breeding ) –3

G1 K1 1 G2 Pi (×10 0.5 K2

E1 E6 0 0246810121416

E3 Lactosylceramide 4-alpha-galactosyltransferase 12 E4 Wild Breeding E2 ) 8 E5 –3

M2 M6 M1 M5 Pi (×10 4

M4 M3 M8 M7 Wild 0 0 5 10 15 20 25 30 SNP_index b 0.3

0.2 Fst

0.1

0.0 Linkage groups 12 3 46 89101214 161719212425 27282931323335373941424344

0.6

0.8

1.0

1.2

Log2 (ratio of Pi) 1.4

1.6

Fig. 6 Genetic diversity and selective sweep analyses in L. vannamei. a The phylogenetic tree of wild (blue) and breeding shrimp (yellow) constructed using whole-genome SNPs. M1−M8 represent the wild shrimp collected from Mexico, E1–E6 represent the breeding shrimp collected from Ecuador, K1 and K2 represent the Kehai No.1 variety from China, G1 and G2 represent Guihai No.1 variety in China, S1 and S2 represent Shrimp Improvement broodstock in

USA, Z1 and Z2 represent Charoen Pokphand broodstock in Thailand. b Genome-wide view of differentiation (FST) and reduction in diversity (πwild/ πbreeding) statistics associated with the wild and breeding shrimp. The blue line indicates the top 1% of rank level values for empirical percentiles. c The genetic diversity value θπ in wild and breeding population for the genes encoding A-kinase anchor protein 13-1 (AKAP13-1) and Lactosylceramide 4-alpha- galactosyltransferase (A4GALT). A sliding-window approach (2 Kb windows with 400 bp increments) was used to calculate θπ of every window for each gene, and the SNP-index indicates the sliding-window of each gene

Transcriptome sequencing and analyses. The L. vannamei samples at different transcriptome reads to the L. vannamei genome60. Cufflinks v2.2.1 (http://cole- developmental stages, molting stages, and different adult tissues were collected at trapnell-lab.github.io/cufflinks/) was used to calculate expected FPKM (Fragments the CAS Key Laboratory of Experimental Marine Biology in Qingdao. Total RNA Per Kilobase of transcript per Million fragments mapped) as expression values for was isolated and purified from all the samples using the TRIzol extraction reagent each transcript. (Thermo Fisher Scientific, USA), in accordance with the manufacturer’s protocol. RNA quality and concentration were assessed using 1% agarose gel electrophoresis, and RNA concentration was measured using Nanodrop 2000 spectrophotometer Gene prediction and annotation. Protein-coding region identification and gene (Thermo Fisher Scientific, USA). The total RNA of the same tissue was pooled to prediction were performed through a combination of homology-based prediction, construct the Illumina sequencing libraries, and then paired-end sequences were de novo prediction and transcriptome-based prediction methods. Proteins from generated using Illumina HiSeq 2000 and HiSeq 2500 platform. The clean reads several species, including D. pulex, P. hawaiensis, Drosophila melanogaster, Ano- were assembled using Trinity (v2.6.5, https://github.com/trinityrnaseq/ pheles gambiae, Locust migratoria and Bombyx mori, were downloaded from NCBI. trinityrnaseq/releases) to obtain transcripts with min_kmer_cov set to 2 and all Protein sequences were mapped against L. vannamei with Exonerate version 2.2.0 other parameters set to default. The TopHat v1.2.1 package was used to map (http://www.ebi.ac.uk/~guy/exonerate/). The blast hits were used in predicting the

NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications 11 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4 gene structure of the corresponding genomic regions. The de novo prediction Regulation of MIH-mediated ecdysone synthesis. To examine how the cho- software, Augustus v2.5.5, was used to predict the coding regions in the repeat- lesterol level and photoperiod regulate molting via MIH, we conducted SREBP masked genome. RNA-Seq data were mapped against the assembly using Tophat gene RNAi, MIH genes RNAi, opsin genes RNAi, Lipitor injection, and cholesterol v2.1.160. Cufflinks v2.2.1 was then used to convert the transcripts from the result of injection experiments, using dsEGFP RNAi and alcohol/phosphate buffer saline Tophat to gene models. All gene models derived from these three methods were (PBS) injection as negative control treatments. The experiments used healthy integrated by EvidenceModeler (EVM) into a nonredundant gene set61. shrimps with a body length of 9.55 ± 0.25 cm and a body weight of 11.20 ± 1.36 g, Functional annotations of the obtained gene set were conducted using BLASTP which were cultured and acclimated in fiberglass tanks with aerated circulating with an E-value of 1e−5 against the NCBI-NR, SwissProt databases, and KEGG seawater for 2 days before the experiment. database. Protein domains were annotated by mapping to the InterPro and Pfam To synthesize dsRNA for RNAi experiment, DNA templates were amplified databases using InterProScan and HMMER62,63. The pathways in which the genes from the eyestalk cDNA sample. The primer pair, dsEGFP-F and dsEGFP-R might be involved were derived from gene mapping against the KEGG databases. (Supplementary Table 25), was used to amplify a 289 bp DNA fragment of the The Gene Ontology (GO) terms were extracted from the corresponding EGFP gene from the pEGFP-N1 plasmid. The dsRNA of SREBP containing a InterProscan or Pfam results. fragment of 754 bp located at the 3′ end of the coding region, and the dsRNA of MIH1036-6 containing a fragment of 377 bp covering the whole coding region, were amplified using the same method with primers in Supplementary Table 25. Noncoding RNA prediction. Noncoding RNA genes were predicted for the repeat- The dsRNA of LW opsin gene (LW-1) were amplified using two pairs of primers masked genome by sequence- and structure-based alignments with the Rfam designed using the conserved regions of the genes (Supplementary Table 26). The noncoding RNA database (http://xfam.org/), with an E-value cutoff of 1e−02 using PCR product (~500 bp) was generated using a T7 promoter anchor attached to the Infernal (http://eddylab.org/infernal/). Specifically, expression criteria were applied two primers used to amplify the product. The PCR products were purified using for miRNA identification, and small RNA sequencing data of L. vannamei were Gel Extraction Kit (OMEGA, Japan). dsRNA was synthesized with TranscriptAid downloaded from the SRA database of NCBI (SRX2648655, SRX2648646, and T7 High Yield Transcription Kit (Thermo Fisher Scientific, USA), in accordance SRX682234). Adaptor and primer sequences were trimmed, and low-quality with the manufacturer’s protocol. sequences were removed. Clean sRNA reads were compared with the Rfam data- We optimized the effective silencing dose of each dsRNA by injecting three base (http://xfam.org/) to exclude noncoding RNAs other than miRNAs. The different dosages, namely 1, 2, and 4 μg, of dsMIH1036-6 or dsSREBP or dsLW-1 remaining sRNA reads were subjected to miRNA identification by mapping to or dsEGFP (as control) into healthy shrimps at the intermolt stage (C). Five predicted pre-miRNA structures in the L. vannamei genome using miRDeep2 and individuals were tested for each dosage for dsMIH1036-6 and dsSREBP, while each miReap (http://sourceforge.net/projects/mireap/). Conserved miRNAs were iden- dosage of dsLW-1 was tested on three individuals. At 48 h after dsRNA injection, tified by BLAST alignment to mature miRNA sequences deposited in the miRBase the eyestalks of the shrimps in each group were collected for RNA extraction. (http://mirbase.org/), requiring alignment ≥15 nt which covers the miRNA seed Expression levels of each gene were determined by qPCR. The primers used for region (position 2−8 nt) and overall mismatch ≤2. qPCR are listed in Supplementary Tables 25 and 26. Primers 18S-qF and 18S-qR were used to detect the expression of the internal reference gene, 18S rRNA. The specificity of the qPCR product was validated by melting-curve analysis performed Gene family analyses. Gene family analysis was performed using OrthoMCL64. at the end of each qPCR reaction. The optimal silencing doses were used in The protein-coding genes of L. vannamei and nine other arthropods (downloaded subsequent experiments. from NCBI: www.ncbi.nlm.nih.gov), including Strigamia maritima, Ixodes scapu- To test the effect of SREBP on MIH expression, 20 individuals at the postmolt laris, Tetranychus urticae, P. hawaiensis, D. pulex, Acyrthosiphon pisum, L. (P) stage were divided into two groups. In the experimental group, each individual migratoria, B. mori, and D. melanogaster, were aligned to each other using the was injected with 1 µg of dsSREBP, while individuals in the control group were BLASTP program. Similarity information from the pairwise sequence alignments injected with 1 µg of dsEGFP each. To test the effect of high cholesterol level on was used as distance parameters for gene family clustering. Species-specific genes MIH expression, 20 individuals at the postmolt (P) stage were divided into two were those that had no homologs in the other species used in this analysis. Orphan groups. Each individual in the experimental group was injected with 400 μg genes are a group of genes identified from species-specific genes that have no cholesterol dissolved in 100% alcohol. The same volume of alcohol was injected homologs to any genes from NCBI Nr database. However, these orphan genes are into individuals in the control group. For both experiments, the eyestalks of all supported by transcriptome data. shrimps in each group were collected at 48 h after injection and total RNA was extracted from different samples using RNAiso Plus reagent (TaKaRa, Japan). Expression levels of three MIH genes were estimated by qPCR, as described above. Phylogenetic tree construction. A phylogenetic tree was constructed using the To test the effect of opsin on MIH expression, shrimps at the same molting = single-copy homologous genes from D. pulex, P. hawaiensis, D. melanogaster, A. stage (D0-D1) were divided into six groups: three dsLW-1 groups (n 20) and = gambiae, L. migratoria, B. mori, Tribolium castaneum, Apis mellifera, Zootermopsis three groups for dsEGFP injection (n 20) as controls. The concentration of nevadensis, P. humanus, A. pisum, I. scapularis, T. urticae, and S. maritima, with dsRNA was 1 µg for all groups. We examined the occurrence of molting three times the RAxML software that is based on the maximum likelihood method65. Briefly, a day after RNAi (at 08:00, 14:00 and 20:00). After silencing for 48, 72, and 96 h, the homologous genes from the genomes of these species were clustered together three shrimps from each group were selected to determine the expressions of MIH using the OrthoMCL software. Each of these clusters was then filtered to obtain a and three LW opsin genes by qPCR. total of 88 single-copy orthologs using our in-house Python script (find_multi- During the aforementioned opsin RNAi experiment, we examined the ple_snp.py). MUSCLE (http://www.drive5.com/muscle) was used for sequence occurrence of molting three times a day (at 08:00, 14:00, and 20:00). At 9 days after alignment, at its default setting. Positions with gaps and missing data were trimmed silencing, we administered a second injection to ensure the effects of silencing. The using an in-house Python script (allfasta2snp.py). The final dataset contained observation lasted for 18 days. 19,161 amino acids and was used to construct the phylogenetic tree with RAxML66 To examine the effect of MIH silencing on molting, 20 individuals at the under the JTT matrix-based model67. Initial trees for the heuristic search were intermolt stage (C) were divided into two groups. Each individual in the μ obtained automatically by applying neighbor-joining and BioNJ algorithms to a experimental group was injected with 4 g MIH1036-6 dsRNA, and each individual μ matrix of pairwise distances estimated using a JTT model. The best tree had a log in the control group was injected with 4 g EGFP dsRNA. The tails of all the likelihood of −3308.02, and was used as an input tree for divergence time esti- individuals in different groups were dissected for observation of the molting stage mation by Bayesian relaxed molecular clock approaches implemented in under a microscope at 48 h after dsRNA injection. To test the effect of low – MCMCTREE from the PAML package68. CODEML from the PAML package was cholesterol level on molting, 20 individuals at the early premolt stage (D0 D2) implemented to estimate the neutral substitution rate per year among species. were divided into two groups. Each individual in the experimental group was μ μ MCMCTREE was then used to calculate gradient vector (g) and Hessian matrix injected with 5 g atorvastatin calcium (Lipitor) predissolved in 20 l PBS (with μ (H). Fossil calibrations were used as priors for the divergence time estimation69. 2.50% DMSO), and each shrimp in the control group was injected with 20 l PBS Cauchy distributions were used with default parameters “p = 0.1, c = 1”. The (with 2.50% DMSO). The uropods of all the individuals in the two groups were MCMCTREE is running twice each for 1,000,000 generations with the sample dissected for observation of the molting stage under a microscope at 48 h after frequency of 50 and a burn-in phase of 50,000 iterations. Tracer (http://beast.bio. injection. ed.ac.uk/) was used to assess chain convergence and adequate effective sample sizes of all parameters. The resulting time-calibrated tree was visualized by using MEGA766. Resequencing and SNP identification. A total of 22 shrimp individuals were selected for genome resequencing. These included eight wild individuals collected from Baja California Sur, Mexico, six cultured individuals selected for four gen- Positive selection. To assess the contribution of natural selection on the single- erations in Ecuador, and eight individuals from four commercial breeding lines in copy orthologs in the L. vannamei genome, the ratios (ω) of nonsynonymous China, the USA and Thailand (Supplementary Table 20), which had all been substitution per nonsynonymous site (dN) to synonymous substitution per subjected to selective breeding for over 20 generations. The raw reads were mapped synonymous site (dS) were calculated using the software package PAML v4.48a68. to the reference genome using BWA v0.7.12 70. The PCR duplicates of each of the The homologs were aligned using MUSCLE. The branch-site model was used to samples were removed with Picard Mark Duplicates. Reads around INDELs from detect positive selection along the foreground branch. Likelihood ratio tests were the BWA alignment were realigned using the IndelRealiger option in the Genome applied to test significant differences between the alterative and null models. Analysis Toolkit (GATK). The HaplotypeCaller of the Genome Analysis Toolkit

12 NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4 ARTICLE

GATK was used to construct the general variant calling file (gVCF) for every 11. Kao, D. et al. The genome of the crustacean Parhyale hawaiensis, a model for individual. The individual gVCF files were combined by GenotypeGVCFs function animal development, regeneration, immunity and lignocellulose digestion. to form a single variant calling file, which includes all the sites. A strict filtering of ELife 5, e20062 (2016). the SNP calls was carried out using the guidelines given by the Broad Institute 12. Colbourne, J. K. et al. The ecoresponsive genome of Daphnia pulex. Science using the options: “FS > 60.0 || MQ < 40.0 || QD < 2.0 || SOR > 3.0”. SNPs 331, 555–561 (2011). with minor allele frequency (MAFs) less than 5% were discarded. The missing rate 13. Yu, Y. et al. Genome survey and high-density genetic map construction of a genotype for population was 10%. Variants were annotated with the provide genomic and genetic resources for the Pacific White Shrimp SNPeff annotation program to identify the synonymous and nonsynonymous Litopenaeus vannamei. Sci. Rep. 5, 15612 (2015). mutations. 14. Shen, S. Z. et al. Calibrating the End-Permian mass extinction. Science 334, 1367–1372 (2011). Identification of candidate genes under breeding selection. A phylogenetic tree 15. Fransen, C. H. J. M. & Grave, S. D. Evolution and Radiation of Shrimp-Like of the 22 individuals was constructed using filtered SNP by the SNPhylo software Decapods: An Overview Vol. 18 (Taylor and Francis/CRC Press, Boca Raton, (parameters: -l 0.6, -m 0.05). To identify the genomic regions significantly affected Crustacean Issues, 2009). fi π π by arti cial breeding, we calculated pairwise nucleotide diversity ( wild/ breeding) 16. Kashi, Y. & King, D. G. Has simple sequence repeat mutability been selected – and divergence index (FST) between the wild and cultured populations using a to facilitate evolution? Isr. J. Ecol. Evol. 52, 331 342 (2006). sliding-window approach (100 Kb windows with 20 Kb increments) using 17. Yuan, J. et al. Genomic resources and comparative analyses of two economical VCFtools (http://vcftools.sourceforge.net/). The genomic regions with top 1% of penaeid shrimp species, Marsupenaeus japonicus and Penaeus monodon. rank level values for empirical percentiles detected by both methods were identified Marine. Genomics 39,22–25 (2018). as potential selective sweeps. GO and KEGG enrichment of genes from the selective 18. Kashi, Y. & King, D. G. Simple sequence repeats as advantageous mutators in sweeps were performed using KOBAS (http://kobas.cbi.pku.edu.cn/). evolution. Trends Genet. 22, 253–259 (2006). 19. Wissler, L., Gadau, J., Simola, D. F., Helmkampf, M. & Bornberg-Bauer, E. Code availability. The associated Perl and Python scripts can be viewed and Mechanisms and dynamics of orphan gene emergence in insect genomes. – downloaded free at the github repository: https://github.com/jianbone/ Genome Biol. Evol. 5, 439 455 (2013). L_vannamei_genome. The software used for data analyses include: Trimmomatic 20. Hu, J., Chen, K. & Bao, Z. A study of the structure of compound eyes of the v.0.35, BluePippin v6.31, Jellyfish-0.3, wtdbg-1.2.8, SMRT Analysis v2.3.0, BWA shrimp, Penaeus chinensis. Mar. Sci. 2,30–34 (1997). v0.7.12, SSPACE 3.0, Bowtie 2.3.4.3, CEGMA v3, RepeatModeler Open-1.0, 21. Luchiari, A. C., Marques, A. O. & Freire, F. A. M. Effects of substrate solour RepeatMasker 4.0.5, Trinity v2.6.5, TopHat v1.2.1, Cufflinks v2.2.1, Exonerate preference on growth of the shrimp Litopenaeus Vannamei (Boone, 1931) v2.2.0, Augustus v2.5.5, Tophat v2.1.1, EvidenceModeler (EVM) r03062010, (Decapoda, Penaeoidea). Crustaceana 85, 789–800 (2012). InterProScan 5, HMMER-3.0a1, miRDeep2, OrthoMCL v1.4, RAxML Workbench 22. Futahashi, R. et al. Extraordinary diversity of visual opsin genes in dragonflies. 1.0, MUSCLE 3.8.31, PAML v4.48a, MEGA7, Genome Analysis Toolkit (GATK) Proc. Natl. Acad. Sci. USA 112, E1247–E1256 (2015). 3.6, SNPeff LGPLv3,VCFtools 1.0.0.0, KOBAS 3.0. 23. Wine, J. J. & Krasne J. The cellular organization of crayfish escape behavior. In The Biology of Crustacea (eds Sandeman, D. C. & Atwood, H. L.) 241−292 Data availability (Academic Press, New York, 1982). 24. Fan, S., Hsu, K., Chen, F. & Ho, B. On the high conduction velocity of the giant Genome sequences data that support the findings of this study have been deposited nerve fiber of shrimp Penaeus orientalis. Chin. Sci. Bull. 12,51–52 (1961). in NCBI GenBank with the accession codes of QCYY00000000. RNA-Seq data 25. Castelfranco, A. M. & Hartline, D. K. Evolution of rapid nerve conduction. were used for annotation and biological analyses: NCBI SRA SRR1460493 Brain Res. 1641,11–33 (2016). −SRR1460495 (https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP043546), − 26. Albertin, C. B. et al. The octopus genome and the evolution of cephalopod SRR1460504 SRR1460505 (https://www.ncbi.nlm.nih.gov/Traces/study/? + acc=SRP043546), SRX1098368−SRX1098375 (https://www.ncbi.nlm.nih.gov/ neural and morphological novelties. Nature 524, 220- (2015). Traces/study/?acc=SRP061180). The genome sequences can also be downloaded 27. Xu, K. & Terakawa, S. Fenestration nodes and the wide submyelinic space from the genome database of L. vannamei (http://www.shrimpbase.net/vannamei. form the basis for the unusually fast impulse conduction of shrimp myelinated – html), where BLAST searching and gene structure visualization are also available. A axons. J. Exp. Biol. 202, 1979 1989 (1999). reporting summary for this article is available as a Supplementary Information file. 28. Guary, J.-C. B. & Kanazawa, A. Distribution and fate of exogenous cholesterol The source data underlying Figs. 1, 2, and 6b are provided as a Source Data file. during the molting cycle of the prawn, Panaeus japonicus bate. Comp. Biochem. Physiol. Part A: Physiol. 46,5–10 (1973). 29. Liebeskind, B. J., Hofmann, H. A., Hillis, D. M. & Zakon, H. H. Evolution of Received: 25 May 2018 Accepted: 7 December 2018 animal neural systems. Annu Rev. Ecol. Evol. S 48, 377–398 (2017). 30. Etxeberria, A. et al. Dynamic modulation of myelination in response to visual stimuli alters optic nerve conduction velocity. J. Neurosci. 36, 6937–6948 (2016). 31. Xu, T. L. & Gong, N. Glycine and glycine receptor signaling in hippocampal neurons: diversity, function and regulation. Prog. Neurobiol. 91, 349–361 References (2010). 1. Dall, W., Hill, B., Rothlisberg, P. & Sharples, D. The biology of the Penaeidae. 32. Chang, E. S., Bruce, M. J. & Tamone, S. L. Regulation of crustacean molting—a Bailliere Tindall Cox 27,1–461 (1990). multi-hormonal system. Am. Zool. 33, 324–329 (1993). 2. Godin, D. M. et al. Evaluation of a fluorescent elastomer internal tag in 33. Chaharbakhshi, E. & Jemc, J. C. Broad-complex, tramtrack, and bric-a-brac juvenile and adult shrimp Penaeus vannamei. Aquaculture 139, 243–248 (BTB) proteins: critical regulators of development. Genesis 54, 505–518 (1996). (2016). 3. Gao, Y. et al. Transcriptome analysis on the exoskeleton formation in early 34. Roer, R., Abehsera, S. & Sagi, A. Exoskeletons across the Pancrustacea: developmetal stages and reconstruction scenario in growth-moulting in comparative morphology, physiology, biochemistry and genetics. Integr. Litopenaeus vannamei. Sci. Rep. 7, 1098 (2017). Comp. Biol. 55, 771–791 (2015). 4. Global Aquaculture Production 1950–2016. http://www.fao.org/fishery/ 35. Nakatsuji, T., Lee, C. Y. & Watson, R. D. Crustacean molt-inhibiting statistics/global-aquaculture-production/query/en (2018). hormone: structure, function, and cellular mode of action. Comp. Biochem 5. Alcivar-Warren, A., Dunham, R., Gaffney, P., Kocher, T. & Thorgaard, G. Phys. A 152, 139–148 (2009). First aquaculture species genome mapping workshop. Anim. Genet. 28, 36. Chen, H. Y., Watson, R. D., Chen, J. C., Liu, H. F. & Lee, C. Y. Molecular 451–452 (1997). characterization and gene expression pattern of two putative molt-inhibiting 6. Zhang, G. F. et al. The oyster genome reveals stress adaptation and complexity hormones from Litopenaeus vannamei. Gen. Comp. Endocr. 151,72–81 (2007). of shell formation. Nature 490,49–54 (2012). 37. Chung, J. & Webster, S. Moult cycle-related changes in bio-logical activity of 7. Berthelot, C. et al. The rainbow trout genome provides novel insights into moult-inhibiting hormone (MIH) and crustacean hy-perglycaemic hormone evolution after whole-genome duplication in vertebrates. Nat. Commun. 5, (CHH) in the , Carcinus maenas: from target to transcript. Eur. J. 3657 (2014). Biochem. 270, 3280–3288 (2003). 8. Xiao, J. H. et al. Signatures of selection in tilapia revealed by whole genome 38. Shi, L. L., Li, B., Zhou, T. T., Wang, W. & Chan, S. M. F. Functional and resequencing. Sci. Rep. 5, 14168 (2015). evolutionary implications from the molecular characterization of five 9. Liu, Z. J. et al. The channel catfish genome sequence provides insights spermatophore CHH/MIH/GIH genes in the shrimp Fenneropenaeus into the evolution of scale formation in teleosts. Nat. Commun. 7, 11757 merguiensis. PLoS ONE 13, e0193375 (2018). (2016). 39. Montagne, N., Desdevises, Y., Soyez, D. & Toullec, J. Y. Molecular evolution of 10. Gutekunst, J. et al. Clonal genome evolution and rapid invasive spread of the the crustacean hyperglycemic hormone family in ecdysozoans. BMC Evol. marbled crayfish. Nat. Ecol. Evol. 2, 567–573 (2018). Biol. 10, 62 (2010).

NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications 13 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-018-08197-4

40. Covi, J. A., Chang, E. S. & Mykles, D. L. Conserved role of cyclic nucleotides in Acknowledgements the regulation of ecdysteroidogenesis by the crustacean molting gland. Comp. We acknowledge financial support from the State 863 High Technology R&D Project Biochem Phys. A 152, 470–477 (2009). of China (2012AA10A404 to J.X.), the National Natural Science Foundation of China 41. Adams, C. M. et al. Cholesterol and 25-hydroxycholesterol inhibit activation (31830100 to F.L. and 41876167 to J.Y.), the grants from Qingdao National of SREBPs by different mechanisms, both involving SCAP and insigs. J. Biol. Laboratory for Marine Science and Technology (MS2017NO04 to F.L.), The Senior Chem. 279, 52772–52780 (2004). User Project of RV KEXUE (KEXUE2018G19 to F.L. and X.Z.), the China Agriculture 42. Brown, M. S. & Goldstein, J. L. The SREBP pathway: regulation of cholesterol Research system-48 (CARS-48 to F.L.) and the Hong Kong Collaborative Research metabolism by proteolysis of a membrane-bound transcription factor. Cell 89, Fund (CRF) project (No. C4042-14G to K.H.C). We acknowledge the support 331–340 (1997). from High Performance Computing Center, Institute of Oceanology, CAS. We would 43. Guo, B., Wang, F., Dong, S. L. & Gao, Q. F. The effect of rhythmic light color like to express our gratitude to Prof. Jun Yu, Prof. Songnian Hu, Prof. Jingfa Xiao and fluctuation on the molting and growth of Litopenaeus vannamei. Aquaculture Dr. Tongwu Zhang (Beijing Institute of Genomics, Chinese Academy of Sciences), 314, 210–214 (2011). Prof.XimingGuo(RutgersUniversity),Dr.RuiqiangLi,Dr.ZhiJiangandDr.Jinbo 44. Li, F. H. & Xiang, J. H. Recent advances in researches on the innate immunity Zhang (Novogene Bioinformatics Institute), Dr. Junyi Wang and Dr. Dongliang Zhan of shrimp in China. Dev. Comp. Immunol. 39,11–26 (2013). (Hangzhou 1Gene Ltd), for their support of the shrimp genome project, genome 45. Denis, M., Thayappan, K., Ramasamy, S. & Munusamy, A. Lectin in innate sequencing and assembly. We appreciate Dr. Hao Huang (Hainan Guangtai Ocean immunity of Crustacea. Austin Biol. 1, 1001 (2016). Breeding Co., Ltd) for the help with shrimp materials and Dr. Yang Zhang and Dr. 46. Zhang, J. et al. Prophenoloxidase-mediated ex vivo immunity to delay fungal Han Cao (Bionano Genomics) help for the construction of optical map. We thank Cui infection after insect Ecdysis. Front. Immunol. 8, 1445 (2017). Zhao, Jiankai Wei, Xiaoqing Sun, JingwenLiu,JiangliDu,MingzheSunandYan 47. Wyban, J., Swingle, J., Sweeney, J. & Pruder, G. Specific pathogen-free Penaeus Zhang for help with DNA, RNA extraction, and data analysis. We are grateful to Prof. vannamei. World. Aquaculture 24,39–45 (1993). LiSun,Prof.LinshengSong,Prof.JinshengSun,Prof.ZhaoxiaCui,Prof.Baozhong 48. Hake, S. & Ross-Ibarra, J. Genetic, evolutionary and plant breeding insights Liu and Dr. Pin Huan for helpful discussions. We thank other faculty and staff at the from the domestication of maize. eLife 4, e05861 (2015). InstituteofOceanology,ChineseAcademyof Science who contributed to the shrimp 49. Hickey, J. M., Chiurugwi, T., Mackay, I., Powell, W. & Cgi, I. G. S. Genomic genome project. We thank Dr. Jing Qin for revisions of the manuscript. We prediction unifies animal and plant breeding programs to form platforms for appreciate Mr. Huawei Zhang for shrimp photography. biological discovery. Nat. Genet. 49, 1297–1303 (2017). fi 50. Argue, B. J., Arce, S. M., Lotz, J. M. & Moss, S. M. Selective breeding of Paci c Author contributions white shrimp (Litopenaeus vannamei) for growth and resistance to Taura Syndrome Virus. Aquaculture 204, 447–460 (2002). J.X., F.L., Lei Wang and Xiaojun Zhang initiated, managed, and drove the shrimp 51. Rubin, C. J. et al. Whole-genome resequencing reveals loci under selection genome sequencing project. Xiaojun Zhang,C.Z.,K.Y.,andJ.K.collectedtheanimal during chicken domestication. Nature 464, 587–U145 (2010). material. Xiaojun Zhang, Y.S., S.L., Y.G., J.Y., and Y.Y. prepared DNA sequencing and 52. Groenen, M. A. M. et al. Analyses of pig genomes provide insight into porcine RNA-Seq analysis. J.Y., Y.S., J.R., B.L., X.W., W.L., X. Zhu, and Long Wang per- demography and evolution. Nature 491, 393–398 (2012). formed genome assembly, gene annotation, genome structure analyses, and phylo- 53. Shibolet, O. et al. AKAP13, a RhoA GTPase-specific guanine exchange factor, genetic analyses. Y.G., Xiaojun Zhang, and J.Y. conducted the environment is a novel regulator of TLR2 signaling. J. Biol. Chem. 282, 35308–35317 (2007). adaptation analysis. S.L., Xiaojun Zhang, Y.G., K.Y.M., and J.Y. conducted the 54. Okuda, T. et al. Targeted disruption of Gb3/CD77 synthase gene resulted in molting regulation analysis. Y.Y., Q.W., and Y.S. conducted the population genetics the complete deletion of globo-series glycosphingolipids and loss of sensitivity analysis. S.L, Y.G., X.L., Xiaoxi Zhang, J.Z., and S.J. performed the validation to verotoxins. J. Biol. Chem. 281, 10230–10235 (2006). experiments. J.Y., Y.S., and C.L. submitted the genome data. C.L. performed the 55. Guo, S. Y., Li, S. H., Li, F. H., Zhang, X. J. & Xiang, J. H. Modification of a ncRNA analysis and constructed the shrimp genome database. J.X., F.L., Xiaojun synthetic LPS-binding domain of anti-lipopolysaccharide factor from shrimp Zhang, J.Y., S.L., Y.G., Y.Y., Y.S., and C.L. wrote the manuscript and additional supplementary files. K.H.C., K.Y.M., J.C., H.Z., Z.L., P.X., A.S., P.S., J.R., A.W., B.L., reveals strong structure-activity relationship in their antimicrobial fi characteristics. Dev. Comp. Immunol. 45, 227–232 (2014). and Lei Wang revised the manuscript. All authors read and approved the nal 56. Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel manuscript. counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011). 57. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Additional information Methods 9, 357–359 (2012). Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467- 58. Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate 018-08197-4. core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007). 59. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive Competing interests: The authors declare no competing interests. elements in genomic sequences. Curr. Protoc. Bioinform. Chapter 4, Unit 4 10 (2009). Reprints and permission information is available online at http://npg.nature.com/ 60. Trapnell, C., Pachter, L. & Lsalzberg, S. TopHat: discovering splice junctions reprintsandpermissions/ with RNA-Seq. Bioinformatics 25, 1105 (2009). 61. Haas, B. et al. Automated eukaryotic gene structure annotation using Journal peer review information: Nature Communications thanks the anonymous EVidenceModeler and the Program to Assemble Spliced Alignments. Genome reviewers for their contribution to the peer review of this work. Peer reviewer reports are Biol. 9, R7 (2008). available. 62. Zdobnov, E. M. & Apweiler, R. InterProScan—an integration platform for the – signature-recognition methods in InterPro. Bioinformatics 17, 847 848 (2001). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in 63. Prakash, A., Jeffryes, M., Bateman, A. & Finn, R. D. The HMMER Web Server published maps and institutional affiliations. for protein sequence similarity search. Curr. Protoc. Bioinform. 60, 3.15.11–13.15.23 (2017). 64. Li, L., Stoeckert, C. J. Jr. & Roos, D. S. OrthoMCL: identification of ortholog – groups for eukaryotic genomes. Genome Res. 13, 2178 2189 (2003). Open Access This article is licensed under a Creative Commons 65. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post- – Attribution 4.0 International License, which permits use, sharing, analysis of large phylogenies. Bioinformatics 30, 1312 1313 (2014). adaptation, distribution and reproduction in any medium or format, as long as you give 66. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and appropriate credit to the original author(s) and the source, provide a link to the Creative high throughput. Nucleic Acids Res. 32, 1792–1797 (2004). Commons license, and indicate if changes were made. The images or other third party 67. Jones, D. T., Taylor, W. R. & Thornton, J. M. The rapid generation of mutation material in this article are included in the article’s Creative Commons license, unless data matrices from protein sequences. Comput. Appl. Biosci. 8,275–282 (1992). indicated otherwise in a credit line to the material. If material is not included in the 68. Yang, Z. PAML: a program package for phylogenetic analysis by maximum article’s Creative Commons license and your intended use is not permitted by statutory likelihood. Comput. Appl. Biosci. 13, 555–556 (1997). 69.Oakley,T.H.,Wolfe,J.M.,Lindgren,A.R.&Zaharoff,A.K. regulation or exceeds the permitted use, you will need to obtain permission directly from Phylotranscriptomics to bring the understudied into the fold: the copyright holder. To view a copy of this license, visit http://creativecommons.org/ monophyletic ostracoda, fossil placement, and Pancrustacean licenses/by/4.0/. phylogeny. Mol. Biol. Evol. 30,215–233 (2013). 70. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows © The Author(s) 2019 −Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

14 NATURE COMMUNICATIONS | (2019) 10:356 | https://doi.org/10.1038/s41467-018-08197-4 | www.nature.com/naturecommunications