Fosmid Library End Sequencing Reveals a Rarely Known Genome Structure of Marine Shrimp Penaeus Monodon
Total Page:16
File Type:pdf, Size:1020Kb
eScholarship Title Fosmid library end sequencing reveals a rarely known genome structure of marine shrimp Penaeus monodon Permalink https://escholarship.org/uc/item/126680ch Journal BMC Genomics, 12(1) ISSN 1471-2164 Authors Huang, Shiao-Wei Lin, You-Yu You, En-Min et al. Publication Date 2011-05-17 DOI http://dx.doi.org/10.1186/1471-2164-12-242 Supplemental Material https://escholarship.org/uc/item/126680ch#supplemental Peer reviewed eScholarship.org Powered by the California Digital Library University of California Huang et al. BMC Genomics 2011, 12:242 http://www.biomedcentral.com/1471-2164/12/242 RESEARCHARTICLE Open Access Fosmid library end sequencing reveals a rarely known genome structure of marine shrimp Penaeus monodon Shiao-Wei Huang1, You-Yu Lin1, En-Min You1, Tze-Tze Liu2, Hung-Yu Shu2, Keh-Ming Wu3, Shih-Feng Tsai3, Chu-Fang Lo1, Guang-Hsiung Kou1, Gwo-Chin Ma4, Ming Chen1,4,5, Dongying Wu6,7, Takashi Aoki8, Ikuo Hirono8 and Hon-Tsen Yu1* Abstract Background: The black tiger shrimp (Penaeus monodon) is one of the most important aquaculture species in the world, representing the crustacean lineage which possesses the greatest species diversity among marine invertebrates. Yet, we barely know anything about their genomic structure. To understand the organization and evolution of the P. monodon genome, a fosmid library consisting of 288,000 colonies and was constructed, equivalent to 5.3-fold coverage of the 2.17 Gb genome. Approximately 11.1 Mb of fosmid end sequences (FESs) from 20,926 non-redundant reads representing 0.45% of the P. monodon genome were obtained for repetitive and protein-coding sequence analyses. Results: We found that microsatellite sequences were highly abundant in the P. monodon genome, comprising 8.3% of the total length. The density and the average length of microsatellites were evidently higher in comparison to those of other taxa. AT-rich microsatellite motifs, especially poly (AT) and poly (AAT), were the most abundant. High abundance of microsatellite sequences were also found in the transcribed regions. Furthermore, via self- BlastN analysis we identified 103 novel repetitive element families which were categorized into four groups, i.e., 33 WSSV-like repeats, 14 retrotransposons, 5 gene-like repeats, and 51 unannotated repeats. Overall, various types of repeats comprise 51.18% of the P. monodon genome in length. Approximately 7.4% of the FESs contained protein- coding sequences, and the Inhibitor of Apoptosis Protein (IAP) gene and the Innexin 3 gene homologues appear to be present in high abundance in the P. monodon genome. Conclusions: The redundancy of various repeat types in the P. monodon genome illustrates its highly repetitive nature. In particular, long and dense microsatellite sequences as well as abundant WSSV-like sequences highlight the uniqueness of genome organization of penaeid shrimp from those of other taxa. These results provide substantial improvement to our current knowledge not only for shrimp but also for marine crustaceans of large genome size. Background consumption [1]. Given their primarily aquatic habitats, Crustaceans (lobster, shrimp, crab, etc.), a remarkable however, they are not as well studied as insects, their ter- group of organisms filling up all types of habitats in the restrial arthropod relatives. ocean with a wide array of adaptations, possess the great- The tiger shrimp (Penaeus monodon) has been one of est species diversity among marine animals. They are the most important captured and cultured marine crusta- not only abundant in number, but also are among the ceans in the world, especially in the Indo-Pacific region most commercially exploited food species for human [1,2]. However, the tiger shrimp industry has been pla- gued by viral diseases [3-5], resulting in substantial eco- * Correspondence: [email protected] nomic losses. Developments in shrimp genomics have 1Institute of Zoology and Department of Life Science, National Taiwan University, Taipei 10617, Taiwan, ROC been limited although a reasonably good EST database is Full list of author information is available at the end of the article available (Penaeus Genome Database; http://sysbio.iis. © 2011 Huang et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Huang et al. BMC Genomics 2011, 12:242 Page 2 of 19 http://www.biomedcentral.com/1471-2164/12/242 sinica.edu.tw/page/) [6]. A genomic analysis for the tiger analyzed with NotI. The average insert sizes (40.8 ± shrimp will make a key contribution to deciphering the 3.6 kb) were close to the expected 40 kb (Additional evolutionary history representing the crustacean lineages, file 2). Therefore, the P. monodon fosmid library covers especially those living in the ocean. The information con- 5.3× haploid genome equivalents based on an estimate tained in the genomic sequences will also benefit the of 2.17×109bp per haploid genome. shrimp industry by offering genomic tools to fend off the viral diseases and to improve the breeding program. Fosmid-end-sequence (FES) analysis The genome size of the penaeid shrimp is estimated to A total of 20,926 high-quality FESs (GenBank accession be 2/3 of the human genome [7] and thus an order of number JJ726384-JJ747309) with read lengths of ≥100 bp magnitude lager than the model invertebrates, Caenorhab- (Additional file 3) were obtained from 11,850 fosmid ditis elegans and Drosophila melanogaster. Concerning clones. Of the 11,850 fosmid clones, 9,072 clones had their larger genome size than other invertebrates, we are both end sequences present in our FESs. The length of most interested in knowing what the makeup of genomic the FESs ranged from 100 bp to 861 bp, with an average DNA in the tiger shrimp genome is. Our initial attempt to read length of 531 bp. A total of 11,114,786 bp of geno- sequence a few fosmid clones was hindered by an unusual mic sequences were generated from this study, represent- high percentage of failure in sequencing reactions and by ing approximately 0.45% of the P. monodon genome. The difficulties in assembling contigs, rousing suspicion that P. monodon genome appeared to be AT-rich, with GC the shrimp genome is extraordinarily repetitive in nature. content of 45.88%. This is the first estimate of GC con- Consequently we set out to have a glimpse of the genomic tent in a marine shrimp. structure by sequencing ends of fosmid clones. The results would offer insights to whole genome sequencing with Repetitive sequence analysis appropriate and effective strategies. To achieve this aim, Repetitive sequences comprise an important part of eukar- we constructed a P. monodon fosmid library from a female yotic genomes, and each species has its own characteristic shrimp and made an initial analysis of 20,926 high-quality repetitive sequences. The overall constitution of repetitive end sequences, a total of 11,114,786 bp representing 0.45% elements in the P. monodon genome was assessed by of the whole genome. The results provide substantial RepeatMasker. Of the 20,926 fosmid end reads, 49.82% improvement to our current knowledge not only for (10,425/20,926) contained repeats (against A. gambiae shrimp but also for the genomic structure of invertebrates repeat database). In terms of lengths, 15.49% and 15.44% with large genomes. of base pairs were repeatmasked against D. melanogaster and A. gambiae repeat database, respectively (Table 1). Results In spite of similar proportions of repetitive sequences Estimation of the P. monodon genome size masked by two different, i.e., D. melanogaster and A. gam- ThegenomesizeofP. monodon has never been deter- biae, databases, the lengths allocated in major repeat types mined experimentally and therefore we measured DNA were different (Table 1). The length of transposable ele- content of hemocytes of P. monodon with flow cytometry, ments (both retrotransposons and DNA transposons) using human lymphocytes as standardized control. In masked in the D. melanogaster database (54,154 bp) was addition, the white shrimp (P. vannamei) genome, whose much less than in the A. gambiae database (109,658 bp), size is known, was used as a reference. The 1C nuclear while the length of simple repeats masked in the D. mela- DNA content of P. monodon was estimated to be ~72.2% nogaster database (858,898 bp) was larger than in the of the human genome, i.e., ~ 2.53 pg DNA per nucleus or A. gambiae database (807,927 bp). 2.17×109bp per haploid genome. We also obtained the Of all repeat types, simple repeats were the most abun- DNA content of P. vannamei to be 71.5% of the human dant type, identified in approximately 7.5% (7.73% against genome (Additional file 1), which is consistent with the D. melanogaster and 7.27% against A. gambiae)ofthe valuepreviouslyreportedbyChowet al.[7].The1C total 11,114,786-bp FESs and accounting for nearly half of value of P. monodon is close to those previously reported the repetitive sequences. Low complexity repeats (3.48% four other penaeid shrimp species (2.37-2.51 pg for average over two databases) and small RNAs (3.77% aver- P. aztecus, P. duorarum, P. vannamei,andP. setiferus; age over 2 databases) were two other abundant repeat see Chow et al. [7]). types (Table 1). Interspersed repeats (mainly retrotranspo- sons and DNA transposons) were the least abundant Construction and characterization of the fosmid library (0.74% average over two databases), accounting for only a The constructed P. monodon fosmid library consists of a small fraction (4.76%) of repetitive sequences in length. total of 288,000 clones arrayed in 750×384-well microti- Among transposable elements, long terminal repeat (LTR) ter plates.