Comparative Transcriptome Analysis of Non-Model Taxus Species for Taxol Biosynthesis
Total Page:16
File Type:pdf, Size:1020Kb
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2015, 7(3):800-805 ISSN : 0975-7384 Research Article CODEN(USA) : JCPRC5 Comparative transcriptome analysis of non-model Taxus species for taxol biosynthesis Prashant Saxena and Balasankar Karvadi * Department of Bioinformatics, Sathyabama University, Chennai, Tamilnadu, India _____________________________________________________________________________________________ ABSTRACT Taxol is a known drug, approved by the food and drug administration in 1992 for the treatment of a wide range of cancers. Taxol is obtained from the taxus plant. Taxus plant has a very long life cycle and the production of taxol from the plant is a tedious process. In this study, we have tried to find new species of taxus which can be able to produce taxol and can overcome the burden of Taxus baccata plant. For this study we have used the next generation sequencing pipeline. In this study, we have collected three transcriptome data of taxus species namely Taxus X media, taxus cuspidata, and Taxus baccata. The transcriptome data contain lots of noise, to remove that noise quality check study were performed to generate high quality data. Once the high quality data were obtained trinity tool was used to generate the high quality reads. Blastx was used to find the homologous enzymes against the high quality reads. The mapping and annotation studies were performed to find out the function of the homologous enzymes. This comparative transcriptome analysis revealed that the enzymes present in the Taxus X media shows higher similarity with Taxus baccata and contains major enzymes which are involved in the taxol biosynthetic process. So the overall postulates of this study decipher that Taxus X media may be used as a new source for taxol production and it may overcome the burden of taxus plant. Keywords: Taxol, Taxus species, NGS, Transcriptome. _____________________________________________________________________________________________ INTRODUCTION The human genome is composed of deoxyribonucleic acid (DNA), a molecule which is very long and occupies helical structure that contains the instructions needed to build and characterize cells. For these instructions to be accomplished, DNA must be transcribed into corresponding molecules of ribonucleic acid (RNA), referred to as transcripts. A transcriptome is a compilation of all the transcripts available in a given cell [1]. There are diverse kinds of RNA. The dominant type, known as messenger RNA (mRNA), plays a fundamental role in making proteins. In this process, mRNA transcribed from genes, which include the protein-coding region of the genome, is delivered to ribosomes, which are molecular machines present in the cell's cytoplasm. The ribosomes read, or "translate," the sequence of the chemical letters in mRNA to assemble building blocks called amino acids into proteins. The mRNA is transcribed from a gene and after that translated into a particular protein. Characterization of transcriptome on a global gene expression level is an optimal application of short-read sequencing. Commonly such analysis suggests complementary DNA (cDNA) library construction, Sanger sequencing of ESTs, and microarray analysis. A new approach knew as NGS (Next generation sequencing) has become a reliable method for increasing sequencing demand and coverage while reducing time and cost compared to the conventional Sanger’s method [2]. Next-generation sequencing (NGS) technology has the broad range potential of mRNA sequencing to affirm the miraculous complexity of the transcriptomes. The transcriptome sequencing approach brings quick insight of a gene, help in identifying the gene of interest, analysis of gene expression and comparative genomic studies of an organism. Although these techniques are becoming cheaper, transcriptome sequencing is still remains an expensive 800 Prashant Saxena and Balasankar Karvadi J. Chem. Pharm. Res., 2015, 7(3):800-805 ______________________________________________________________________________ endeavour. The major challenges are the assembly of millions and billions of RNA-seq reads to construct the complete transcriptome poses with the wide range information [3]. Taxus is a genus of yews, small coniferous trees or shrubs in the yew from kingdom Plantae, division Pinophyta, class Pinopsida, order Pinales, family Taxaceae. They are relatively slow-growing and can be very long-lived, and reach heights of 1–40 m, with trunk diameters of up to 5 m [4]. They have reddish bark, lanceolate, flat, dark- green leaves 1–4cm long and 2–3mm broad, arranged spirally on the stem, but with the leaf bases twisted to align the leaves in two flat rows either side of the stem [5]. Taxus is a gymnosperm genus of yews, small coniferous trees or shrubs in the yew from kingdom Plantae, division Pinophyta, class Pinopsida, order Pinales, family Taxaceae. The life cycle of these plants is relatively very long, and they are very slow growing. Normal heights of these trees are 1 cm to 40cm and diameter is up to 5cm. Taxus plants are very rich in economic value, have medical importance and produce a drug named Taxol [6] which is used for the treatment of a wide range of cancer including colon cancer and breast cancer. Some recent studies decipher that Taxol (Paclitaxel) [7] may also be useful in the treatment of alzheimer’s disease [8]. Taxol is an anti-cancer chemotherapy drug. Taxol is classified as a "plant alkaloid," a "taxane" and an "antimicrotubule agent” [9]. Taxol (Paclitaxel) [7] is one of the natural diterpenoid from the bark of the yew ( Taxus brevifolia) [10]. It has the capability to destroy the tumor cells by build-up the assembly of microtubules and constrain their depolymerisation. It is an approved drug by the food and drug administration (FDA) for the treatment of a wide range of cancers using chemotherapy. For the commercial production of taxol very few species are being used till date, among them Taxus baccata is commonly used species, due to which the burden of taxol production from this plant increasing day by day and it is tough for the pharmaceutical companies to cope up with the market demand of taxol with very few known species. So to keep this challenge in our mind we have tried to find the other Taxus species which may also be able to produce taxol to decrease the burden of Taxus baccata plant and met with the demand of market. EXPERIMENTAL SECTION Data Collection The transcriptome data of the three taxus species i.e. Taxus baccata (SRR065067) , Taxus cuspidata (SRR032523)and Taxus X media (SRR534003)were collected from the DDBJ-DRA [11] (http://trace.ddbj.nig.ac.jp/DRASearch/) database. The transcriptome data contain83913444, 60173416 and 1225567350 bases respectively. Pre-Processing of Reads The transcriptome pre-processing analysis comprises of three stages namely- Primary analysis, Secondary analysis and Tertiary analysis. These steps together comprise the pre-processing of reads. QC is an important and effective measure for determining sample libraries’ qualities. Primary analysis of the transcriptome data includes scrutinizes the quality of the reads. Once we understand the quality of the reads we can make the decision over the data. If the quality of the obtained data is not satisfactory, various cleaning measures can be used to make the data useful. For this current study for the primary analysis of the data we have used two tools i.e. FASTQC and NGSQC Toolkit [12]. De-Novo Assembly of Reads De-novo is referring to the start from beginning and de-novo assembly refers to the process of creating a transcriptome without the help of a reference genome. This approach is preferred over other methods to study the non-model organism. These assembled transcriptomes are able to decipher the peculiar protein which plays an important role in the biological process of the organism. In this study we have used DDBJ read annotation pipeline [13] [14] Trinity [15] tool for the assembly of the filtered reads. Functional Annotations The process of functional annotation refers to the annotation of the transcriptomes. Annotation includes methods to add biological information to the raw DNA sequence, identify the structural and functional elements, and integrate and display this information at a genomic level. Functional annotation of the transcriptomes was performed to identify the expressed genes present in the transcriptome of all three taxus species. The functional annotations of the 801 Prashant Saxena and Balasankar Karvadi J. Chem. Pharm. Res., 2015, 7(3):800-805 ______________________________________________________________________________ three transcriptomes were performed using Blast2Go PRO package. Blast2Go PRO [16] is a commercial package, developed by BioBam bioinformatics solutions. RESULTS The three transcriptome sequences of Taxus species were retrieved from the DDBJ-DRA database. Once the transcriptome data were downloaded the first step is the quality check of the data to verify the quality of the data. Quality Check Transcriptome data of any organism contains lots of the noise in it. So the first step of the quality check is to identify the noise of the data and region in which noise resides. (A) (B) (C) Figure 1: Raw reads (A) Taxus X media (B) Taxus cuspidata (C) Taxus baccata Once the noise of the data is been identified the next step is to clean the data and make this data useful. To make this data useful we have used NGSQC toolkit for the quality check. We used Illuqc and Trimming tool to increase the quality of the data and removal of unwanted data from the all three transcriptome sequence of Taxus species. (A) (B) (C) Figure 2: Filtered reads (A) Taxus X media (B) Taxus cuspidata (C) Taxus baccata The graph shown in the figure is box-whisker graph, which is divided into three different colours. The green colour represents the very high quality reads, orange colour shows good quality reads while red colour shows the low quality reads. Figure-1 shows the low quality reads which contains a high amount of noise in the data while figure-2 contains the filtered reads which is noise free.