Free PDF Download
Total Page:16
File Type:pdf, Size:1020Kb
Eur opean Rev iew for Med ical and Pharmacol ogical Sci ences 2014; 18: 3504-3510 Both genes and lncRNAs can be used as biomarkers of prostate cancer by using high throughput sequencing data W.-S. CHENG 1, H. TAO 1, E.-P. HU 1, S. LIU 1, H.-R. CAI 1, X.-L. TAO 1, L. ZHANG 1, J.-J. MAO 2, D.-L. YAN 1 1Department of Urology, Taizhou Municipal Hospital, Taizhou, Zhejiang, China 2Department of Infectious Diseases, Taizhou Municipal Hospital, Taizhou, Zhejiang, China Abstract. – OBJECTIVE: 1 To investigate neous cancer among men in the western world . prostate cancer-related genes and lncRNAs by Most patients with metastatic prostate cancer still usiMngATaEhRigIAhLtShrAouNgDhpMutEsTeHqOuDeSn:cing dataset. die from this disease, despite of transient efficient RNA-seq data of androgen deprivation therapy 2. Thus, it is of were obtained from the sequencing read archive database, including both benign and great importance to understanding the molecular malignant tumor samples. After aligning the mechanism of the development of prostate cancer RNA-seq reads to human genome reference, so as to make better treatment to this lethal disease. gene expression profile as well as lncRNA ex - Prostate cancer risk has been shown to have a pression profile was obtained. Next, Student’s strong genetic component 3. Several genome-wide t-test was used to screen both the differentially association works have identified numerous expressed genes (DEGs) and lncRNAs (DELs) common variants conferring risk of prostate can - between benign and malignant samples. Final - 4,5 ly, Goseq was used to conduct the functional cer . A number of regions across the genome annotation of DEGs. have been reported to be associated with prostate RESULTS: 6 A total of 7112 DEGs were cancer, including chromosome aberrations , gene screened, such as ZNF512B, UCKL1, STMN3, mutations and gene fusions 7. Genes, such as GMEB2, and PTK6. The top 10 enriched func - HPC1 8, HPCX 9, and p53 10 suggested to be in - tions of DEGs were mainly related to organism volved in the pathways associated with prostate development, including multi-cellular develop - ment, system development and anatomical cancer. Notwithstanding plenty of studies have structure development. Also, we discovered 26 been performed in order to fully reveal the genet - difCfeOreNnCtiLaUllySIeOxNprSe:ssed lncRNAs. ic changes associated with the development of The analysis used in this prostate cancer tissues, the full spectrum of study is reliable in screening prostate cancer prostate cancer genomic alterations remains in - markers including both genes and lncRNAs by completely characterized, needless to say the using RNA-seq data, which provides new in - sight into the understanding of molecular lncRNA (long non-coding RNAs) markers that mechanism of prostate cancer. may participate in the prostate cancer genesis. Keywords In the current study, the RNA-data of benign Prostate cancer, RNA-seq, lncRNA, DEG, DEL. and malignant prostate cancer samples were col - lected to profile both gene expression and lncR - NA expression, following by a statistical test to screen the differentially expressed genes (DGEs) Introduction and differentially expressed lncRNAs (DELs), as well as a functional enrichment analysis. Our study may add up to the thorough understanding Prostate cancer is one type of adenocarcinoma of the molecular mechanism knowledge of (also named glandular cancer), which begins when prostate cancer in terms of coding genes and long normal semen-secreting prostate gland cells be - non-coding RNAs, which will possibly con - come being mutated into cancerous cells. Prostate tribute to the revealing the potential mechanism cancer is the second leading cause of death from in the process from benign status to malignant cancer in men, and the most common noncuta - status. 3504 Corresponding Author: Dongliang Yan, MD; e-mail: [email protected] Both genes and lncRNAs can be used as biomarkers of prostate cancer Materials and Methods non-coding RNAs, we performed a Student’s T- RNA-seq Data Acquisition and Analysis test for gene expression file and lncRNA expres - sion profile separately to distinguish differential - Sequencing data of prostatic cancer were ob - ly expressed molecules between benign samples tained from the Sequencing Read Archive (SRA) and malignant samples. To evaluate the false dis - database in NCBI website. The data consist of 3 covery rate (FDR), we shuffled the benign sam - benign tumor samples and 3 malignant tumor ples and malignant samples 100,000 times ran - samples 11 . Annotation information of the sequenc - domly and redid the analysis to calculate the ran - ing data was also downloaded from the Illumina dom p value distribution. GAIIx platform. The accession numbers were Functional Enrichment Analysis of DEGs SRR073760, SRR073761, and SRR073762 for benign samples, and SRR073769, SRR073770, For RNA sequencing data, length of different and SRR073771 for malignant samples. The qual - genes varies largely, and this could be a potential ity of the sequencing data was evaluated by using factor that could influence the robustness of GO a tool called FastQC (http://www.bioinformat - enrichment analysis by introducing bias. Goseq ics.babraham.ac.uk/projects/fastqc/). Parameters [14] was suggested to be able to eliminate the bias include the GC and AT contents, number of total caused by gene length difference. Hence we used reads, and average length of the reads. this tool to analyze the enriched Gene Ontology Reads Alignment to Human Genome and terms and infer the functions of DEGs. GO terms RNA Expression Quantification with a p value less than 0.05 after Bonferroni cor - rection were defined as significantly enriched GO. The sequencing data were aligned to the human genome (hg19) downloaded from UCSC (Univer - 12 Results sity of California Santa Cruz) using TopHat . The parameter was set as –g 1, and only the sequences RNA-seq Data Alignment and Quality with unique location were left to quantify RNA Analysis expression. Ensemble gene annotation was used to annotate the gene expression profiles across all 6 After aligning the sequencing reads to human samples, and genes with at least one read count genome (hg19), we found that in average about were defined as expressed genes. The raw expres - 90% percent of the reads can be mapped back to sion file was normalized using the RPKM (Reads human genome reference (Table I), indicating a Per Kilobase of exon per Million mapped reads) good mapping ratio as well as a great usable data method. We used the same procedure to profile the ratio in the downstream analysis. Also, we ana - expression matrix of lncRNA, of which the anno - lyzed the quality of the sequencing data to inves - tation was downloaded from Human Body Map tigate parameters such as GC content, total reads lincRNAs database [13]. RPKM was also used to number, median reads length and average base normalize the expression of lncRNA across all quality value (Table I). samples. lncRNAs with RPKM >1 was defined as Gene Expression Analysis expressed lncRNA. Principle Component Analysis and Principal component analysis and hierarchical Hierarchical Cluster Analysis cluster analysis were performed to compare the gene expression difference of benign the malignant We performed a principle component analysis samples in a global view. As shown in Figure 1 A by using R’s ‘princomp’ method in ‘stats’ pack - and B, benign and malignant samples are well sep - age for gene expression profile and lncRNA ex - arated on gene expression level indicated by both pression profile respectively. Benign samples and PCA plot and hierarchical cluster plot, indicating cancerous samples are labeled with different col - that prostatic tumor samples can distinguish them - ors. Further, we also made a hierarchical cluster selves from normal samples on gene expression analysis for both gene expression and lncRNA level. Interestingly, we also observed that on lncR - expression by using the Euclidian distance. NA expression level, this effect also exists. Specif - DEGs and DELs Analysis ically, 3 cancer samples are well separated from the other 3 normal samples from both PCA plot Aiming to screen prostate cancer associated (Figure 1 C) and hierarchical cluster plot (Figure 1 RNA molecules including coding genes and long D). These evidences suggest that besides genes, 3505 W.- S. Cheng, H. Tao, E.- P. Hu, S. Liu, H.- R. Cai, X.-l. Tao, L. Zhang, J. -J. Mao, D.-l. Yan Table I. Statistics of the raw RNA-seq data. Sample GC AT Total Mapped Mapping Average name content content reads reads ratio length SRR073760 53% 47% 5300188 4823172 91% 35 SRR073761 53% 47% 5347764 4759514 89% 35 SRR073762 54% 46% 4778245 4300420 90% 35 SRR073769 53% 47% 8175900 7358312 90% 35 SRR073770 52% 48% 5372814 4835540 90% 35 SRR073771 53% 47% 5210292 4637162 89% 35 lncRNAs are also candidate molecular markers ZNF512B, UCKL1, STMN3, GMEB2, and that can distinguish prostate samples from benign PTK6. The distribution of DEGs p values is samples, indicating the involvement of lncRNAs in shown in Figure 2 A. The randomization test (see the regulation and development of prostate cancer. materials and methods) shows that the false dis - DEGs and DELs Screening Analysis covery rate (FDR) is lower than 0.01, suggesting these DEGs are more true signals than back - In order to detect genes involved in the ground noise. Further, we used the same test to process from benign to malignant, Student’s t- screen differentially expressed lncRNAs (DELs). test was conducted to identify the DEGs. In total, In the end, we obtained 26 significant DELs out 7112 DEGs were identified when setting the of 725 expressed lncRNAs. The p value distribu - threshold of p < 0.0001. Typical DEGs include tion for lncRNA is shown in Figure 2 B. Figure 1. PrincAiple com - ponent analysis ( ) and hi - eBrarchical cluster analysis ( ) of 6 samples including 3 benign samples and 3 cancer samples on gene expression level.