<<

CORE Metadata, citation and similar papers at core.ac.uk

Provided by Nature Precedings

Interspecific differences in single n ucleotide polymorp his ms (SNPs) and in expressed sequence tag libraries of oil palm Elaeis guineensis and E. oleifera

Aikkal Riju 1 and Vadivel Arunachalam 2*

(1)Aikkal , Kanul, Kannur , Kerala - 670564,India (2) Central Plantation Crops Research Institute, Kudlu P.O, Kasaragod – 671124, Kerala, In- dia.

more than E. oleifera. We provide the re - ABSTRACT sults of the study as online database (http:/ / riju.byethost31.com /oilpalm /) Oil palm is the second largest source for use by oil palm breeders. of edible oil, which meets one- fifth of global demands of oils and fats. Ex- Keywords: , , Transi - tion, Trans- version, Single Nucleotide Poly- pressed sequence tag (EST) sequencing morphisms & programs have provided a wealth of in- INTRODUCTION formation, identifying novel genes from a broad range of organisms and provid - The Oil palm a tropical palm tree ing an indication of gene expression lev- originated in West Africa but has since el in particular tissues. It also provides been planted successfully in tropical re - the richest source of biologically useful gions within 20 degrees of the equator. SNPs due to the relatively high redun - Oil palm is the second largest source of dancy of gene sequence, the diversity of edible oil next only to soybean, which genotypes represented within databases. contributes approximately 20% of the EST based SNPs are potential molecular world's production of oils and fats. Oil

Nature Precedings : hdl:10101/npre.2009.3593.1 Posted 12 Aug 2009 markers and aid in genetic improve - is extracted from both the pulp of the ment. A total of 21062 and 2053 poly - fruit (palm oil, edible oil) and the kernel morphic (SNP and Indel) sites in E. (palm kernel oil, used mainly for soap guineensis species and in E. oleifera , manufacture). Both palm oil and palm 4955 SNPs and 1172 Indels were detect - kernel oil are high in olefins, a poten - ed. SNP(17.5/kbp) and Indel(4.1/kbp) tially valuable chemical group that can frequency was higher in E. oleifera than be processed into many non- food prod - E. guineensis species (16.8/kbp , ucts as well. There are cultivated two 1.6/kbp). E. oleifera showed higher species of oil palm, Elaeis guineensis to ratio (1.40) Jacq. from tropical western Africa and E. than in E. guineensis (1.02). The ratio of oleifera from Americas. African oil palm Ts vs Tv showed, the genetic divergence is widely cultivated and American oil is occurring in this crops in different palm is important in terms of compact fashion and E. guineensis had diverged growth habit and resistance to diseases. Oil palm has a large diploid genome of

Riju and Arunachalam 1 K.Takahashi et al.

3400 MB distributed in 32 chromo - of a crop species. Single Nucleotide somes. polymorphisms (SNPs) possess desirable Expressed Sequence Tags or ESTs properties as molecular markers. Bial- provide researchers with a quick and in- lelism makes them easy to score in high expensive route for discovering new throughput genotyping assays. Molecu - genes, for obtaining data on gene ex- lar genetic markers developed from pression and regulation, and for the ESTs can be used to examine a group of construction of the Genome maps. A individuals or populations to estimate single nucleotide polymorphism, or SNP various diversity measures and genetic (pronounced snip ), is a DNA sequence distances, infer genetic structure and variation occurring when a single nu - clustering patterns, test for Hardy- cleotide - A, T, C, or G - in the genome Weinberg equilibrium and multi- locus differs between members of a species equilibrium, and to test polymorphic (or between paired in an loci for evidence of selective neutrality. individual). Single Nucleotide Polymor - They are useful to plant breeders, phisms may fall within coding se - germplasm managers, and population quences of genes, non- coding regions of geneticists. The use of EST sequence genes, or in the intergenic regions. SNPs data for the identification of SNPs has within a coding sequence will not neces - many advantages that can be exploited sarily change the amino acid sequence to facilitate the development of highly of the protein that is produced, due to dense genetic maps and markers assist - degeneracy of the . If DNA ed breeding programs [Varshney et. al., sequence in which any change in the 2005]. SNPs can be used to saturate ge- base pairs do not result in the change of netic maps in plants [Bhattramakki and polypeptide sequence, then it is termed Rafalski, 2001]. synonymous (sometimes called a silent Recently EST resources of oil palm mutation) - if a different polypeptide are being developed at IRD, Montpellier sequence is produced, they are non- France and MPOB, Kuala Lumpur synonymous. SNPs that are not in pro - Malaysia tein- coding regions may still have con - (http:/ / palmoils.mpob.gov.my/palm - sequences for gene splicing, transcrip - genes.html ) and deposited at dbEST. Expressed sequence tag (EST) sequenc -

Nature Precedings : hdl:10101/npre.2009.3593.1 Posted 12 Aug 2009 tion factor binding, or the sequence of non- coding RNA. The study of single ing programs have provided a wealth of nucleotide polymorphisms is also im - information, identifying novel genes portant in crop and livestock breeding from a broad range of organisms and programs. Expressed sequence tags providing an indication of gene expres - (ESTs) are an important resource for sion level in particular tissues [Adams identifying polymorphisms in tran - et. al., 1995]. EST sequence data may scribed regions. provide the richest source of biologically Single nucleotide polymorphisms useful SNPs due to the relatively high (SNPs) have been shown to be the most redundancy of gene sequence, the diver - abundant source of DNA polymorphism sity of genotypes represented within in human, animal and plant genomes. databases, and the fact that each SNP SNPs are the most common type of alle- would be associated with an expressed les found within and between varieties gene [Picoult- Newberg et. al., 1999]. Re-

2 In Interspecific differences in single nucleotide polymorphisms (SNPs) and indels in expressed sequence tag libraries of oil palm Elaeis guineensis and E. oleifera

cent study of Oil palm EST libraries for Zap libraries hold very few ESTs, so we SNP detection showed that the genome ignored those libraries for this study. has the SNP frequency 1.36/100bp [Riju Majority of the sequences represents et. al., 2007 ]. We have used updated root, mesocarp tissue, young flower and EST libraries of Oil palm species for this mature flower. We have used the contig analysis to find the Interspecific differ - building package Cap3 [Huang and ences in SNP / Indel polymorphisms. Madan, 1999] server (http:/ / mobyle. - SNP detection perl script AutoSNP ver - pasteur.fr/cgibin/MobylePortal/por - sion.1.0 were used to find the SNP site tal.py#) to cluster the ESTs and made information and transition vs transver - necessary ace files of each tissue. Au- sion analysis [Barker et. al., 2003]. EST- toSNP version.1.0 was used to find the SNP can be detected by using other pro - candidate SNPs from these contigs. The grams or servers such as SEAN [Huntley transition(Ts) vs transversion(Tv) ratio et. al., 2006 ], PolyPhred [Nickerson et. of all the libraries to find the DNA sub - al., 1997], PolyBayes [Marth et. al., 1999], stitution in Oil palm genome was also TRACE_DIFF [Bonfield et. al.,1998 ], Hap - calculated. loSNPer [Tang et. al., 2008] and RESULTS AND DISCUSSION HarvEST(http:/ / harvest.ucr.edu) but Au- We found 21062 SNPs and 2053 In- toSNP provides user friendly approach dels sites in E. guineensis (Table 1). In E. and interpretable result as html file. oleifera 4955 SNPs and 1172 Indels were SNPs can be classified based on their found (Table 2). Considering those SNP nucleotide substitution as either transi - site by nucleotide substitution, , a total tion (G↔A or C↔T) or of 10635 transitions(Ts), 10427 transversion(C↔G, A↔T, C↔A or (Tv) and 2053 indel poly - T↔G). Indel sites can classified to four morphisms were detected in E. guineen - groups based on the nucleotide involved sis, where as in E. oleifera a total of (A/T/C/G). Thus there are ten kinds of 2891 transition, 2064 transversion and SNP/indel (two types of transition, four 1172 indel polymorphisms were detect - types of transversion and four groups of ed. SNPs occurred at a frequency of indels) are possible in genomes. 16.8/1kbp in E. guineensis species and MATERIALS AND METHODS 17.5/1kbp in E. oleifera species. Indel

Nature Precedings : hdl:10101/npre.2009.3593.1 Posted 12 Aug 2009 frequency 1.6/1kbp and 4.1/1kbp ob - EST database, dbEST of NCBI release served in E. guineensis and E. oleifera re - 121407 contains 16,999 Elaeis guineen - spectively. While finding the substitu - sis (GenBank: EL680966 - EL695503 and tion comparison, tissues like mature EE593287 - EE593337) and 3,205 Elaeis flower and suspension cell libraries oleifera (GenBank: EL563704 - ES414798 shows higher transversion sites than and BM402088, BM402089 and transition. The Indel polymorphism ratio EB643519 - EB643628) expressed se - (4.1/1kbp) found to be high in mesocarp quence tags. We have used five (root, tissue than other tissue libraries ob - mature flower, shoot apical meristems, served. The overall transition vs suspension cells and young flower) EST transversion ratio in E. oleifera is 1.4, libraries of Elaeis guineensis [Jouannic which shows that the relative increase of et. al., 2005], [Chai- Ling et. al., 2007] transition over transversion. Though in and one EST library (mesocarp) of Elaeis case of E. guineensis the ratio is 1.02, oleifera . Zygotic embryo and Lambda which shows that the transition as well

Riju and Arunachalam 3 K.Takahashi et al.

as transversion occurring in a similar and Klein (1999) identified 5 SNPs in fashion. Transversion was high in E. 1kbp of cDNA of Picea rubens and Picea guineensis and Indels were reported mariana , and also discovered SNPs in higher in E. oleifera. Estimation of the Ts the chloroplasts of these species. In Vs Tv rate is important to understand soyabean( Glycine max ), 5 SNPs were the pattern of DNA sequence evolution. found approximately every 1kbp It has been repeatedly noted that at low [Coryell et. al., 1999],[Van et. al., 2004]. level of genetic divergence, ts/tv ap - In maize (Zea mays ), SNPs occur at fre - pears to be high and at high levels of quently, with one SNP approximately ev- genetic divergence, ts/tv appears to be ery 48 bp and every 130 bp in 3’ un - low. By concluding the transition and translated regions and coding regions, transversion bias result, in our case the respectively [Tenaillon et. al., 2001], transversion have occurred approxi - [Rafalski, 2002]. SNPs occur at very low mately similar manner in E. guineensis frequency of .07 every 1kbp in apple and slightly variable in E. oleifera. It may (Malus domestica ) ESTs [Newcomb et. al., because of the higher level of genetic di - 2006]. Comparing these crops the SNPs vergence has occurred so far in in oil palm is more frequent as observed E.guineensis than E.olefera. AFLP and in maize and agree with earlier study RFLP markers revealed that genetic di- [Riju et.al., 2007] on this crop with few - vergence between the two species is of er EST libraries (SNP frequency as 1.36 the same magnitude as that among SNPs per 100 bp with a higher rate of provenances of E.olefera (Barcelos et. al., transition over transversion) . In slico ap - 2000). The genetic diversity relationship proach to ESTs revealed the SNPs in oil between the two Elaeis species by AFLP palm as more abundant hence a valuable and RFLP suggest that AFLP markers, is tool to find the genetic diversity and more important than the divergence de - breeding programs along with AFLP and tected by RFLP markers (Barcelos et al., RFLP. Our study will help oil palm re - 2002). EST analysis shows an insight of searchers about the single nucleotide intrerspecific molecular genetic varia - polymorphism and nucleotide substitu - tions within oil palm. These transitions tions. The tissue wise EST clusters and to transversions ratio showed the their SNP and Indel site information is happening in this made available at Nature Precedings : hdl:10101/npre.2009.3593.1 Posted 12 Aug 2009 crop with different fashion. Only fewer www.riju.byethost31.com/oilpalm . EST resources were generated on E. CONCLUSION oleifera species yet. Even though with Available est resources at dbest were available source has to believe that the analysed for the putative snp and indels molecular divergence is happening in E. in oil palm. Snps occurred at a guineensis species than E. oleifera . frequency of 16.8/1kbp in e. Guineensis Recent SNP report on beetroot ex- species and 17.5/1kbp in e. Oleifera pressed gene showed the SNP frequency species. Indel frequency 1.6/1kbp and as 1 every 130bp [Schneider et. al., 2001] 4.1/1kbp observed in e. Guineensis and and ESTs [Batley et.al., 2003 ] of maize e. Oleifera respectively. There are very also reported relative increase of transi - few ests are available for e. Oleifera tion sites over transversion. Germano when comparing with the e. Guineensis species. Snp frequency was observed

4 In Interspecific differences in single nucleotide polymorphisms (SNPs) and indels in expressed sequence tag libraries of oil palm Elaeis guineensis and E. oleifera

approximately equal in both species but Batley, J., Barker, G., Helen, O’Sullivan., Ed- a higher frequency of indels were wards, K. J. and Edwards, D.(2003). Mining observed in e. Oleifera species for Single Nucleotide Polymorphisms and (4.1/1kbp). Transition to transversion Insertions/Deletions in Maize Expressed Se- ratio was lower (1.02) in e. Guineensis quence Tag Data Plant Physiology. 132 , 84– 91. species than e. Oleifera (1.4). This transition to transversion ratio shows the different level of molecular Bhattramakki, D. and Rafalski, A(2001). Dis- evolution happening in this crop. Only covery and application of single nucleotide fewer est resources were generated on e. polymorphism markers in plants. In: Henry Oleifera species yet. Even though with R.J. (ed.) Plant Genotyping: the DNA Finger - available source has to believe that the printing of plants. CABI Publishing, Oxon, molecular divergence is happening in e. UK, pp. 179- 191. Guineensis species than e. Oleifera . This in silico analysis on oil palm shows the Bonfield, J.K., Rada, C. and Staden, R.(1998). potential snp markers for use in oil Automated detection of point palm breeding and the database we using fluorescent sequence trace subtrac - created would help to use the tion. Nucleic Acids Res., 26 , 3404–3409. information in designing new primers and develop more markers and saturate Chai- Ling, H., Yen- Yen,K., Mei- Chooi, C., the linkage maps. The study also Sue- Sean, T., Wai- Har, N., Kok- Ang, L., highlights the nucleotide substitution Yang- Ping, L., Siew- Eng, O., Weng- Wah, L., analysis in available oil palm EST Jin- Ming, T., Siang- Hee, T., resources. Kulaveerasingam, H., Sharifah, S.R.S.A. and REFERENCES Meilina, O.A. (2007). Analysis and functional Adams, M.D., Kerlavage, A.R., Fleischmann, annotation of expressed sequence tags R.D., Fuldner, R.A., Bult, C.J., Lee, N.H., (ESTs) from multiple tissues of oil palm Kirkness, E.F., Weinstock, K.G., Gocayne, (Elaeis guineensis Jacq.) BMC Genomics J.D. and White ,O (1995). Initial assessment 8:381. of human gene diversity and expression patterns based upon 83- million nucleotide of cDNA sequence. Nature. 377 : 3. Coryell, V.H., Jessen, H., Schupp, J.M., Webb,

Nature Precedings : hdl:10101/npre.2009.3593.1 Posted 12 Aug 2009 D. and Keim, P.(1999). Allele- specific hy - Barceló’s, E., Amblard, P., Berthaud, J. and bridisation markers for soybean. Theor Appl Seguin, M(2002). Genetic diversity and rela - Genet. 101 : 1291–1298. tionship in American and African oil palm as revealed by RFLP and AFLP molecular markers. Pesq. agropec. bras., Brasília, v. Germano, J. and Klein, A.S. (1999). Species 37 , n. 8, 1105- 1114. specific nuclear and chloroplast single nucleotide polymorphisms to distinguish Picea glauca , P. mariana and P. rubens . Barker, G., Batley, J., O’sullivan, H., Edwards, Theor Appl Genet. 99 , 37–49. K.J., and Edwards, D.(2003). Redundancy based detection of sequence polymor - phisms in expressed sequence tag data us - Huang, X. and Madan, A. (1999). CAP3: a ing autoSNP. Bioinformatics . 19 : 421- 422. DNA sequence assembly program. Genome Res. 9, 868–877.

Riju and Arunachalam 5 K.Takahashi et al.

Huntley, D., Baldo, A., Johri, S. and Sergot, M.(2006). SEAN: SNP prediction and display Rafalski, A. (2002). Applications of single program utilizing EST sequence clusters. nucleotide polymorphisms in crop genet - Bioinformatics. 22(4) : 495- 496. ics. Curr Opin Plant Biol. 5, 94–100.

Jouannic, S., Argout,X., Lechauve, F., Riju, A., Chandraseker, A. and Arunachalam, Fizames, C., Borgel, A., Morcillo, F., Aber - V.(2007). Mining for single nucleotide poly - lenc- Bertossi, F., Duval, Y. and Treger, morphisms and insertions / deletions in ex- J.(2005). Analysis of expressed sequences pressed sequence tag libraries of oil palm. tags from oil palm (Elaeis guineensis) . Bioinformation. 2(4), 128- 131. FEBS . 579 , 2709 – 2714.

Schneider, K., Weisshaar, B., Borchardt, D.C. Marth, G.T., Korf, I., Yandell, M. D., Yeh, R. and Salamini, F.(2001). SNP frequency and T., Zhijie, G., Zakeri,H., Stitziel, N.O., Hillier, allelic haplotype structure of Beta vulgaris L., Kwok,P.Y. and Gish, W. R.(1999). A gen - expressed genes. Mol. Breed. 8, 63- 74. eral approach to single- nucleotide poly - morphism discovery. Nat. Genet . 23 , 452– 456. Tang, J.,Leunissen, J.AM., Voorrips, R.E., Lin- den, C.G. and Vosman, B.(2008). HaploS- NPer: a web- based allele and SNP detection Newcomb R. D., Crowhurst, R.N., Gleave, tool. BMC , 9:23. A.P., Rikkerink, E.H.A., Allan, A.C., Beuning, L.L., Bowen, J.H., Gera, E., Jamieson, K.R., Janssen, B.J., Laing, W.A., McArtney,A., Nain, Tenaillon, M.I., Sawkins, M.C., Long, A.D., B., Ross, G.S., Snowden, K.C., Souleyre, E.J.F., Gaut, R.L., Doebley, J.F. and Gaut, B.S. Walton, E. F. and Yauk, Y.K. (2006) Analyses (2001). Patterns of DNA sequence poly - of Expressed Sequence Tags from Apple. morphism along 1 of maize Plant Physiology, 141 , 147–166. (Zea mays ssp mays L.). Proc Natl Acad Sci USA. 98 : 9161–9166. Nickerson, D.A., Tobe, V.O. and Taylor, S. L. (1997). PolyPhred: automating the detec - Van K, Hwang EY, Kim MY, Kim IH, Cho YI, tion and genotyping of single nucleotide Cregan PB, Lee SH. Discovery of single nu -

Nature Precedings : hdl:10101/npre.2009.3593.1 Posted 12 Aug 2009 substitutions using fluorescence- based cleotide polymorphisms in soybean using resequencing. Nucleic Acids Res. 25, 2745– primers designed from ESTs. Euphytica. 2751. 2004; 139 :147–157.

Picoult- Newberg, L., Idenker, T.E., Pohl, Varshney, R. K., Graner, A. and Sorrells, M. M.G., Taylor, S.L., Donaldson, M.A., Nicker - E. (2005). Genic markers in son D.A. and Boyce- Jacino, M. (1999). Min- plants: features and applications. Trends in ing SNPs from EST databases. Genome Res. Biotechnology. 23 : 48- 55. 9: 167- 174.

6 In Interspecific differences in single nucleotide polymorphisms (SNPs) and indels in expressed sequence tag libraries of oil palm Elaeis guineensis and E. oleifera

Table 1. A total of 6332 consensus EST sequence are used to predict the SNP site from E. guineensis species, which made 1922 cluster groups and found 23115 SNP- Indel sites.

Tissue No of No of Consensus SNP site Transi - Transver - Indels Ts / Frequen - Frequen - Name ESTs contigs size (bp) tions (Ts) sions (Tv) Tv cy of in- cy of dels / SNPs / kbp kbp Root 223 719 415106 6070 3132 2938 736 1.066 1.8 14.6 0 Mature 131 376 255349 4736 2320 2416 322 0.960 1.3 18.5 flower 7 Shoot apex 684 215 141729 3249 1685 1564 268 1.077 1.9 22.9 Suspen - 539 174 124035 2498 1217 1281 173 0.950 1.4 20.1 sion cell cultures Young 156 438 317548 4509 2281 2228 554 1.024 1.7 14.2 flower 2 Total 633 1922 1253767 21062 10635 10427 2053 1.02 1.6 16.8 2

Table 2. A total of 1403 consensus EST sequence are used to predict the SNP site form E.Oleifera species, which made 404 cluster groups and found 6127 SNP- Indel sites. No of Consen - Frequency Frequency Tissue No of SNP Transi - Transver - con - sus size Indels Ts / Tv of indels of SNPs per Name ESTs Site tions(Ts) sions (Tv) tigs (bp) per kbp kbp Meso - 1403 404 283721 4955 2891 2064 1172 1.401 4.1 17.5 carp tis - sue Total 404 2891 2064 1.401 4.1 17.5 Nature Precedings : hdl:10101/npre.2009.3593.1 Posted 12 Aug 2009 1403 283721 4955 1172

Riju and Arunachalam 7