Detecting Sources of Transcriptional Heterogeneity in Large-Scale RNA-Seq Data Sets Brian C

Total Page:16

File Type:pdf, Size:1020Kb

Detecting Sources of Transcriptional Heterogeneity in Large-Scale RNA-Seq Data Sets Brian C Genetics: Early Online, published on October 11, 2016 as 10.1534/genetics.116.193714 Detecting sources of transcriptional heterogeneity in large-scale RNA-Seq data sets Brian C. Searle, Rachel M. Gittelman, Ohad Manor, and Joshua M. Akey Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA 1 Copyright 2016. RUNNING TITLE Detecting sources of heterogeneity KEY WORDS GTEx Consortium; gene expression normalization; random forest classification; transcriptional heterogeneity CORRESPONDING AUTHOR Joshua M. Akey, PhD [email protected] Department of Genome Sciences University of Washington School of Medicine Box 355065 1705 NE Pacific Street Seattle, WA 98195 1 ABSTRACT 2 Gene expression levels are dynamic molecular phenotypes that respond to biological, 3 environmental, and technical perturbations. Here we use a novel replicate classifier approach for 4 discovering transcriptional signatures and apply it to the Genotype-Tissue Expression (GTEx) 5 data set. We identified many factors contributing to expression heterogeneity, such as collection 6 center and ischemia time and our approach of scoring replicate classifiers allows us to 7 statistically stratify these factors by effect strength. Strikingly, from transcriptional expression in 8 blood alone we detect markers that help predict heart disease and stroke in some patients. Our 9 results illustrate the challenges and opportunities of interpreting patterns of transcriptional 10 variation in large-scale data sets. 2 1 INTRODUCTION 2 Unlike previous large-scale tissue (FANTOM et al. 2015) or cell-type (ENCODE et al. 3 2012) specific expression data sets, the GTEx Project (GTEx et al. 2015) is unique in the breadth 4 of tissue types sampled from the same individuals. The GTEx Consortium has previously 5 demonstrated that tissue-specific gene expression signatures are preserved in postmortem 6 samples using hierarchical clustering (Melé et al. 2015), which groups samples by gene 7 expression using a data-driven approach to identify hidden structure in the data. While 8 hierarchical clustering is effective at identifying the greatest global source of variation, it does 9 not capture more subtle sources of variation. For example, in the context of the GTEx Project, 10 hierarchical clustering largely captures gene expression variation due to tissue type, but less 11 effectively captures the influence of confounding factors like age or sex. 12 Using the GTEx pilot data freeze version 4, we attempted to recapitulate the results of 13 hierarchical clustering using supervised Random Forest (RF) classification (Breiman 2001). 14 Unlike hierarchical clustering, RF uses sample type annotations in a training data set to create 15 decision trees where the nodes correspond to genes whose expression levels distinguish between 16 tissue types. Although RF classification typically considers a single classifier per classification 17 task, we randomly generated replicate classifiers to statistically assess how well two groups can 18 be distinguished. This approach is markedly distinct from hierarchical clustering or PCA and 19 enables statistical uncertainty to be rigorously quantified. These analyses reveal strong 20 transcriptional signatures that contribute to patterns of expression heterogeneity in the GTEx data. 21 More broadly, our results highlight that a deeper understanding of the determinants of 22 transcriptional variation enable insights into the biological factors that govern variation in gene 23 expression among tissues and individuals. 3 1 MATERIALS AND METHODS 2 Normalization and Data Curating: 3 We first removed samples of non-European descent and summed counts for technical 4 replicates. We normalized expression profiles to the upper quartile (Bullard et al. 2010) and 5 removed globally weak responding genes (13.7%) where no tissue type had more than 10 6 samples with at least 5 counts. We also removed D-statistic outlier samples ('t Hoen et al. 2013) 7 (approximately 3%) if they correlated poorly across all genes to over half the samples within the 8 tissue type (Pearson’s correlation scores <0.5). This resulted in a data set containing 17,702 9 genes and 1,821 total samples from 165 donors across the 23 tissue types we considered. The 10 genes, samples, and donors are enumerated in Table S3. We used normalized gene expression 11 results downloaded from the GTEx website to interrogate the effect of DESeq/PEER 12 normalization on the strength of cofactors. 13 14 Random Forest classifiers for predicting tissue type: 15 Random Forest is an ensemble method for classifying groups using a collection of 16 decision trees. In this work we chose to use RF because unlike most machine learning 17 approaches, RF classification is robust in the face of large numbers of features and high feature 18 redundancy, making it ideal for classifying gene expression data sets. Additionally, the decision 19 trees inside a RF are generally easy to interpret, which we exploit to identify split-point gene 20 signatures. Realistically, considering only the issue of tissue classification, the approach we 21 present here would still likely work well with a different classification method as its foundation. 22 Each tree in the forest is trained using a bootstrapped selection of “in-bag” samples (a 23 random subset with replacement). Typically decision trees are trained at each node to determine 4 1 the feature (of N total features) that best splits the samples between two classes using the entire 2 feature set. However, in RF typically each split is chosen from a subset of the square root of N 3 randomly selected features. These two levels of randomness help buffer from over-fitting in the 4 presence of a high number of features. RF prediction for a sample is essentially a voting system 5 where the prediction is the majority vote classification across all of the trees. 6 In this study we used entropy as a measure of information gain when selecting decision 7 points. Decision split-points that were already biased towards a 90% or greater decision were 8 eliminated to improve generalization. For each forest approximately 37% of the samples are not 9 selected (i.e. “out-of-bag”) in each bootstrapped sample group. While RF tree pruning is 10 uncommon for efficiency concerns, we use these out-of-bag samples to prune leaves that lower 11 classification accuracy in “unseen” training set samples to help improve generalization. 12 Unlike SVM or logistic regression methods that produce unique, global solutions, RF 13 classifiers are affected by random starting conditions. Each time we trained a RF classifier we 14 selected a different random starting point and a different subset of training data, which 15 consequently produced slightly different performances. We took advantage of this by generating 16 100 “one-vs-rest” (Bishop 2006) binary RF classifiers for a given tissue type where each 17 classifier operated as a “technical” replicate. 18 Each RF was aggregated across 100 weak predictor decision trees. This number of weak 19 predictors was the point at which we found the ROC-AUC was guaranteed to have converged. 20 We used all of the M query samples (specific to the classified tissue type) in our training/testing 21 sample pool and M/2 non-target samples from each of the background tissue types to maintain an 22 even distribution for classifier comparison. Each forest was generated using 80% of the samples 23 randomly selected from the sample pool for each tissue type. The corresponding 20% of the 5 1 samples were reserved exclusively to evaluate the classification accuracy of our classifier. We 2 limited the number of non-target samples in the testing set to be no more than 90% of the total 3 testing pool. This percentage is relatively high to ensure sufficient background tissue diversity. 4 In an effort to speed up the process of classification, we trimmed each feature list before 5 classification to the top 1% of genes that separated the 80% randomly selected training samples 6 using a Mann-Whitney U test. Finally, we performed ROC-AUC integration calculations using 7 the trapezoidal rule and calculated confidence intervals around median ROC-AUC values using 8 medians of 100 bootstrapped sample sets. 9 Since we generated 100 replicate RF classifiers per tissue, we can determine critical 10 decision genes by counting the number of times each gene is used as a decision split-point. Due 11 to the decision tree splitting procedure the actual number of times a gene can be used for splitting 12 scales with the number of samples. However, tissue-specific genes are used repeatedly over the 13 100 replicate classifiers, and the relative number of repeats can indicate key tissue-specific 14 decision split-points. For each tissue type, we ran Gene Ontology (GO) enrichment analysis 15 using the online PANTHER Overrepresentation Test (Mi et al. 2013) (release 20150430) using 16 the Homo sapiens GO Ontology database (Released 2015-06-06) on the top 100 decision split- 17 point genes. We required a stringent <0.05 Bonferroni corrected p-value for Biological Process 18 GO enrichment. The number of independent tests is calculated as the number of ontology classes 19 with at least two genes in the reference list. 20 21 Random Forest classifier for predicting blood-specific signatures: 22 Blood-specific markers for identifying sex, collection center, and ischemia time, were 23 identified using binary Random Forests, while classification of donor death was performed with 6 1 one-vs-rest Random Forests using a similar system to that for predicting tissue type. We broke 2 each factor down into the lowest number of possible classes specifically to limit signal dilution 3 across the 165 donor blood samples. A randomly guessing classifier should produce a ROC- 4 AUC score of 0.5. To verify this, for each classifier we randomly permuted the sample labels and 5 calculated a background ROC-AUC.
Recommended publications
  • Investigation of the Underlying Hub Genes and Molexular Pathogensis in Gastric Cancer by Integrated Bioinformatic Analyses
    bioRxiv preprint doi: https://doi.org/10.1101/2020.12.20.423656; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Investigation of the underlying hub genes and molexular pathogensis in gastric cancer by integrated bioinformatic analyses Basavaraj Vastrad1, Chanabasayya Vastrad*2 1. Department of Biochemistry, Basaveshwar College of Pharmacy, Gadag, Karnataka 582103, India. 2. Biostatistics and Bioinformatics, Chanabasava Nilaya, Bharthinagar, Dharwad 580001, Karanataka, India. * Chanabasayya Vastrad [email protected] Ph: +919480073398 Chanabasava Nilaya, Bharthinagar, Dharwad 580001 , Karanataka, India bioRxiv preprint doi: https://doi.org/10.1101/2020.12.20.423656; this version posted December 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Abstract The high mortality rate of gastric cancer (GC) is in part due to the absence of initial disclosure of its biomarkers. The recognition of important genes associated in GC is therefore recommended to advance clinical prognosis, diagnosis and and treatment outcomes. The current investigation used the microarray dataset GSE113255 RNA seq data from the Gene Expression Omnibus database to diagnose differentially expressed genes (DEGs). Pathway and gene ontology enrichment analyses were performed, and a proteinprotein interaction network, modules, target genes - miRNA regulatory network and target genes - TF regulatory network were constructed and analyzed. Finally, validation of hub genes was performed. The 1008 DEGs identified consisted of 505 up regulated genes and 503 down regulated genes.
    [Show full text]
  • PGA3 Mouse Monoclonal Antibody [Clone ID: 2C1] Product Data
    OriGene Technologies, Inc. 9620 Medical Center Drive, Ste 200 Rockville, MD 20850, US Phone: +1-888-267-4436 [email protected] EU: [email protected] CN: [email protected] Product datasheet for AM06373SU-N PGA3 Mouse Monoclonal Antibody [Clone ID: 2C1] Product data: Product Type: Primary Antibodies Clone Name: 2C1 Applications: ELISA, IHC, WB Recommended Dilution: ELISA: 1/10000. Western Blot: 1/500-1/2000. Immunohistochemistry on Paraffin Sections: 1/200-1/1000. Reactivity: Human Host: Mouse Isotype: IgG1 Clonality: Monoclonal Immunogen: Purified recombinant fragment of Human PGA5 expressed in E. Coli. Specificity: This antibody recognizes Human PGA5. Other species not tested. Formulation: State: Ascites State: Ascitic fluid Preservative: 0.03% Sodium Azide Conjugation: Unconjugated Storage: Store undiluted at 2-8°C for one month or (in aliquots) at -20°C for longer. Avoid repeated freezing and thawing. Stability: Shelf life: one year from despatch. Predicted Protein Size: 42 kDa Database Link: Entrez Gene 643834 Human P0DJD8 This product is to be used for laboratory only. Not for diagnostic or therapeutic use. View online » ©2021 OriGene Technologies, Inc., 9620 Medical Center Drive, Ste 200, Rockville, MD 20850, US 1 / 2 PGA3 Mouse Monoclonal Antibody [Clone ID: 2C1] – AM06373SU-N Background: PGA5: Pepsinogen 5, group I (pepsinogen A). Pepsinogens are the inactive precursors of pepsin, the major acid protease found in the stomach. Pepsin is one of the main proteolytic enzymes secreted by the gastric mucosa. Pepsin consists of a single polypeptide chain and arises from its precursor,pepsinogen, by removal of a 41 amino acid segment from the N- terminus.
    [Show full text]
  • An Atlas of Human Gene Expression from Massively Parallel Signature Sequencing (MPSS)
    Downloaded from genome.cshlp.org on September 25, 2021 - Published by Cold Spring Harbor Laboratory Press Resource An atlas of human gene expression from massively parallel signature sequencing (MPSS) C. Victor Jongeneel,1,6 Mauro Delorenzi,2 Christian Iseli,1 Daixing Zhou,4 Christian D. Haudenschild,4 Irina Khrebtukova,4 Dmitry Kuznetsov,1 Brian J. Stevenson,1 Robert L. Strausberg,5 Andrew J.G. Simpson,3 and Thomas J. Vasicek4 1Office of Information Technology, Ludwig Institute for Cancer Research, and Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; 2National Center for Competence in Research in Molecular Oncology, Swiss Institute for Experimental Cancer Research (ISREC) and Swiss Institute of Bioinformatics, 1066 Epalinges, Switzerland; 3Ludwig Institute for Cancer Research, New York, New York 10012, USA; 4Solexa, Inc., Hayward, California 94545, USA; 5The J. Craig Venter Institute, Rockville, Maryland 20850, USA We have used massively parallel signature sequencing (MPSS) to sample the transcriptomes of 32 normal human tissues to an unprecedented depth, thus documenting the patterns of expression of almost 20,000 genes with high sensitivity and specificity. The data confirm the widely held belief that differences in gene expression between cell and tissue types are largely determined by transcripts derived from a limited number of tissue-specific genes, rather than by combinations of more promiscuously expressed genes. Expression of a little more than half of all known human genes seems to account for both the common requirements and the specific functions of the tissues sampled. A classification of tissues based on patterns of gene expression largely reproduces classifications based on anatomical and biochemical properties.
    [Show full text]
  • LJELSR: a Strengthened Version of JELSR for Feature Selection and Clustering
    Article LJELSR: A Strengthened Version of JELSR for Feature Selection and Clustering Sha-Sha Wu 1, Mi-Xiao Hou 1, Chun-Mei Feng 1,2 and Jin-Xing Liu 1,* 1 School of Information Science and Engineering, Qufu Normal University, Rizhao 276826, China; [email protected] (S.-S.W.); [email protected] (M.-X.H.); [email protected] (C.-M.F.) 2 Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen 518055, China * Correspondence: [email protected]; Tel.: +086-633-3981-241 Received: 4 December 2018; Accepted: 7 February 2019; Published: 18 February 2019 Abstract: Feature selection and sample clustering play an important role in bioinformatics. Traditional feature selection methods separate sparse regression and embedding learning. Later, to effectively identify the significant features of the genomic data, Joint Embedding Learning and Sparse Regression (JELSR) is proposed. However, since there are many redundancy and noise values in genomic data, the sparseness of this method is far from enough. In this paper, we propose a strengthened version of JELSR by adding the L1-norm constraint on the regularization term based on a previous model, and call it LJELSR, to further improve the sparseness of the method. Then, we provide a new iterative algorithm to obtain the convergence solution. The experimental results show that our method achieves a state-of-the-art level both in identifying differentially expressed genes and sample clustering on different genomic data compared to previous methods. Additionally, the selected differentially expressed genes may be of great value in medical research. Keywords: differentially expressed genes; feature selection; L1-norm; sample clustering; sparse constraint 1.
    [Show full text]
  • The Effects of Artificially Dosed Adult Rumen Contents on Abomasum
    G C A T T A C G G C A T genes Article The Effects of Artificially Dosed Adult Rumen Contents on Abomasum Transcriptome and Associated Microbial Community Structure in Calves Naren Gaowa 1 , Wenli Li 2,*, Brianna Murphy 2 and Madison S. Cox 3,4 1 State Key Laboratory of Animal Nutrition, Beijing Engineering Technology, Research Center of Raw Milk Quality and Safety Control, College of Animal Science and Technology, China Agricultural University, No.2 Yuanmingyuan West Road, Haidian, Beijing 100193, China; [email protected] 2 The Cell Wall Utilization and Biology Laboratory, USDA Agricultural Research Service, US Dairy Forage Research Center, Madison, WI 53706, USA; [email protected] 3 Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA; [email protected] 4 Microbiology Doctoral Training Program, University of Wisconsin-Madison, Madison, WI 53706, USA * Correspondence: [email protected]; Tel.: +1-(608)-890-0056 Abstract: This study aimed to investigate the changes in abomasum transcriptome and the associated microbial community structure in young calves with artificially dosed, adult rumen contents. Eight young bull calves were randomly dosed with freshly extracted rumen contents from an adult cow (high efficiency (HE), n = 4), or sterilized rumen content (Con, n = 4). The dosing was administered within 3 days of birth, then at 2, 4, and 6 weeks following the initial dosing. Abomasum tissues were collected immediately after sacrifice at 8 weeks of age. Five genera (Tannerella, Desulfovibrio, Deinococcus, Leptotrichia, and Eubacterium; p < 0.05) showed significant difference in abundance Citation: Gaowa, N.; Li, W.; Murphy, between the treatments.
    [Show full text]
  • Localizing Recent Adaptive Evolution in the Human Genome
    Localizing Recent Adaptive Evolution in the Human Genome Scott H. Williamson1*, Melissa J. Hubisz1¤a, Andrew G. Clark2, Bret A. Payseur2¤b, Carlos D. Bustamante1, Rasmus Nielsen3 1 Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America, 2 Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America, 3 Center for Bioinformatics and Department of Biology, University of Copenhagen, Copenhagen, Denmark Identifying genomic locations that have experienced selective sweeps is an important first step toward understanding the molecular basis of adaptive evolution. Using statistical methods that account for the confounding effects of population demography, recombination rate variation, and single-nucleotide polymorphism ascertainment, while also providing fine-scale estimates of the position of the selected site, we analyzed a genomic dataset of 1.2 million human single-nucleotide polymorphisms genotyped in African-American, European-American, and Chinese samples. We identify 101 regions of the human genome with very strong evidence (p , 10À5) of a recent selective sweep and where our estimate of the position of the selective sweep falls within 100 kb of a known gene. Within these regions, genes of biological interest include genes in pigmentation pathways, components of the dystrophin protein complex, clusters of olfactory receptors, genes involved in nervous system development and function, immune system genes, and heat shock genes. We also observe consistent evidence of selective sweeps in centromeric regions. In general, we find that recent adaptation is strikingly pervasive in the human genome, with as much as 10% of the genome affected by linkage to a selective sweep.
    [Show full text]
  • Changes in Liver Gene Expression of Azin1 Knock-Out Mice
    Changes in Liver Gene Expression of Azin1 Knock-out Mice Tao Wana, Yuan Hua, Wenlu Zhanga, Ailong Huanga, Ken-ichi Yamamurab, and Hua Tanga,* a Key Laboratory of Molecular Infectious Diseases, Ministry of Education, Chongqing Medical University, Chongqing 400016, China. Fax: +86 - 23 - 68 48 67 80. E-mail: [email protected] b Department of Developmental Genetics, Institute of Molecular Embryology and Genetics, Kumamoto University, Kumamoto 860 - 0811, Japan * Author for correspondence and reprint requests Z. Naturforsch. 65 c, 519 – 527 (2010); received January 5/March 3, 2010 The ornithine decarboxylase antizyme inhibitor (AZI) was discovered as a protein that binds to the regulatory protein antizyme and inhibits the ability of antizyme to interact with the enzyme ornithine decarboxylase (ODC). Several studies showed that the AZI protein is important for cell growth in vitro. However, the function of this gene in vivo remained unclear. In our study, we analyzed the transcriptional profi les of livers on the 19th day of pregnancy of Azin1 knock-out mice and wild-type mice using the Agilent oligonucleotide ar- ray. Compared to the wild-type mice, in the liver of Azin1 knock-out mice 1812 upregulated genes (fold change ≥ 2) and 1466 downregulated genes (fold change ≤ 0.5) were showed in the microarray data. Altered genes were then assigned to functional categories and mapped to signaling pathways. These genes have functions such as regulation of the metabolism, transcription and translation, polyamine biosynthesis, embryonic morphogenesis, regulation of cell cycle and proliferation signal transduction cascades, immune response and apopto- sis. Real-time PCR was used to confi rm the differential expression of some selected genes.
    [Show full text]
  • Pepsin (PGA5) (NM 014224) Human Untagged Clone Product Data
    OriGene Technologies, Inc. 9620 Medical Center Drive, Ste 200 Rockville, MD 20850, US Phone: +1-888-267-4436 [email protected] EU: [email protected] CN: [email protected] Product datasheet for SC321793 Pepsin (PGA5) (NM_014224) Human Untagged Clone Product data: Product Type: Expression Plasmids Product Name: Pepsin (PGA5) (NM_014224) Human Untagged Clone Tag: Tag Free Symbol: PGA5 Synonyms: Pg5 Vector: pCMV6-AC (PS100020) E. coli Selection: Ampicillin (100 ug/mL) Cell Selection: Neomycin Fully Sequenced ORF: >OriGene sequence for NM_014224.1 TCTCCCTCGAGTTGGGACCCGGGAAGAACCATGAAGTGGCTGCTGCTGCTGGGTCTGGTG GCGCTCTCTGAGTGCATCATGTACAAGGTCCCCCTCATCAGAAAGAAGTCCTTGAGGCGC ACCCTGTCCGAGCGTGGCCTGCTGAAGGACTTCCTGAAGAAGCACAACCTCAACCCAGCC AGAAAGTACTTCCCCCAGTGGGAGGCTCCCACCCTGGTAGATGAACAGCCCCTGGAGAAC TACCTGGATATGGAGTACTTCGGCACTATCGGCATCGGAACTCCTGCCCAGGATTTCACC GTCGTCTTTGACACCGGCTCCTCCAACCTGTGGGTGCCCTCAGTCTACTGCTCCAGTCTT GCCTGCACCAACCACAACCGCTTCAACCCTGAGGATTCTTCCACCTACCAGTCCACCAGC GAGACAGTCTCCATCACCTACGGCACCGGCAGCATGACAGGCATCCTCGGATACGACACT GTCCAGGTTGGAGGCATCTCTGACACCAATCAGATCTTCGGCCTGAGCGAGACGGAACCT GGCTCCTTCCTGTATTATGCTCCCTTCGATGGCATCCTGGGGCTGGCCTACCCCAGCATT TCCTCCTCCGGGGCCACACCCGTCTTTGACAACATCTGGAACCAGGGCCTGGTTTCTCAG GACCTCTTCTCTGTCTACCTCAGCGCCGATGACAAGAGTGGCAGCGTGGTGATCTTTGGT GGCATTGACTCTTCTTACTACACTGGAAGTCTGAACTGGGTGCCTGTTACCGTCGAGGGT TACTGGCAGATCACCGTGGACAGCATCACCATGAACGGAGAGACCATCGCCTGTGCTGAG GGCTGCCAGGCCATTGTTGACACCGGCACCTCTCTGCTGACCGGCCCAACCAGCCCCATT GCCAACATCCAGAGCGACATCGGAGCCAGCGAGAACTCAGATGGCGACATGGTGGTCAGC TGCTCAGCCATCAGCAGCCTGCCCGACATCGTCTTCACCATCAATGGAGTCCAGTACCCC
    [Show full text]
  • Supplementary Methods
    Supplementary Methods Patient Cohorts and Clinicopathological Review AOCS and Clinical Cohort AOCS is a population-based, case control study of ovarian cancer which recruited eligible women aged 18-79 years with newly diagnosed epithelial ovarian cancer (including primary peritoneal and fallopian tube cancer) through 20 gynaecologic oncology units across all Australian states, between January 2002 and June 2006. Women who were unable to provide informed consent due to illness, language difficulties or mental incapacity were excluded, as were those whose diagnosis was not histopathologically confirmed. Detailed clinical and follow-up data was obtained from medical records at predefined intervals: post surgery, post primary chemotherapy, at 6-monthly intervals to 5 years and annually thereafter. All treatment details and clinical assessments were recorded on case report forms using Good Clinical Practice (GCP) guidelines (ICH E6: Good Clinical Practice: Consolidated Guidance. 1996; http://www.fda.gov/oc/gcp/guidance.html), and included chemotherapy regimen, imaging details and pre- and post-treatment serum CA125 levels. The CA125 assay type and upper limit of normal for each measurement were recorded and assignment of relapse date was based on Gynecologic Cancer Intergroup (GCIG) criteria1. Clinical details for the cases from The Royal Brisbane Hospital, and Westmead Hospital were obtained retrospectively from medical records. The majority of patients with invasive ovarian cancer (n= 267) underwent laparotomy for diagnosis, staging and debulking and subsequently received first-line platinum/taxane based chemotherapy. In most cases, tumor examined was excised at the time of primary surgery, prior to the administration of chemotherapy. Eighteen patients who had neoadjuvant, platinum based chemotherapy were also included in the study as were 18 patients with low malignant potential disease.
    [Show full text]
  • Detection of Fusion Genes and Their Expression in Gastric Cancer Patients in Mizo Population
    Detection of fusion genes and their expression in gastric cancer patients in Mizo population By Ranjan Jyoti Sarma Registration No. and Date: MZU/M.Phil./473 of 03/05/2018 Dissertation submitted in fulfilment of the Requirements for the degree of Master of Philosophy in Biotechnology Under the supervision of Dr. N. Senthil Kumar Professor, Department of Biotechnology School of Life Sciences, Mizoram University, Aizawl, Mizoram-796004, India 2019 ACKNOWLEDGEMENT I am thankful to Prof. N Senthil Kumar for his constant support and motivation throughout the work. I am thankful to the sample donors which made possible to conduct the study. A special thanks to Dr. Jeremy L. Pautu, Regional Cancer Research Centre, Zemabawk; Dr. John Zohmingthanga, Civil Hospital, Aizawl and Dr. Lalawmpuii Pachuau, Civil Hospital, Aizawl. Special thanks to Dr. Arindam Maitra, Scientist, NIBMG for providing the next- generation sequencing facility. I am also thankful to Dr. J. Bhattacharya (Head) along with all the teaching , non- teaching staffs and all the research scholar of Department of Biotechnology, Mizoram University. I am also thankful to DBT, New Delhi for (DBT-Biotech Hub) the fellowship and DBT-BIF for providing the computational facility. I am thankful to my family members for lifting my confidence up with constant support. Dated: Place: Aizawl, Mizoram Ranjan Jyoti Sarma TABLE OF CONTENTS Content Page Number Abbreviations List of Figures List of Tables Introduction and Literature Review 1-14 Objective 15 Materials and Methods 16-26 Result 27-39 Discussion 40-44 Summary 45 Appendices 46-54 Appendix -I: The preview of the questionnaire followed for the sample 46-50 collection of the study.
    [Show full text]
  • Characteristics of Biopeptides Released in Silico from Collagens Using Quantitative Parameters
    Characteristics of biopeptides released in silico from collagens using quantitative parameters Anna Iwaniak*, Piotr Minkiewicz, Monika Pliszka, Damir Mogut, Małgorzata Darewicz Supplement Table S1. SMILES strings and structures of peptides with ionized acidic and basic groups. Sequence BIOPEP- SMILES2 Structure UWM ID1 PGL 7507 [H][C@@](CC(C)C)(NC(=O)CNC(=O)[C@]1([H])CCC[NH2+]1)C([O-])=O RL 3257; 8886 [H][C@]([NH3+])(CCCNC(N)=[NH2+])C(=O)N[C@@]([H])(CC(C)C)C([O-])=O 1 GF 7591; 8782; [H][C@@](Cc1ccccc1)(NC(=O)C[NH3+])C([O-])=O 9488 SF 7685; 8891; [H][C@]([NH3+])(CO)C(=O)N[C@@]([H])(Cc1ccccc1)C([O-])=O 9432 TF 8185; 8900; [H][C@](C)(O)[C@]([H])([NH3+])C(=O)N[C@@]([H])(Cc1ccccc1)C([O-])=O 9471; 9486 QF 8870; 9431 [H][C@]([NH3+])(CCC(N)=O)C(=O)N[C@@]([H])(Cc1ccccc1)C([O-])=O 2 DF 9074 [H][C@]([NH3+])(CC([O-])=O)C(=O)N[C@@]([H])(Cc1ccccc1)C([O-])=O DR 8769 [H][C@]([NH3+])(CC([O-])=O)C(=O)N[C@@]([H])(CCCNC(N)=[NH2+])C([O- ])=O GR 7603 [H][C@@](CCCNC(N)=[NH2+])(NC(=O)C[NH3+])C([O-])=O 1. This column contains links to peptide data in the BIOPEP-UWM database. 2. Blue – positively charged basic groups, red – negatively charged acidic group 3 Table S2. Predicted targets for PGL peptide. Red font indicates 15 most likely targets. Order: "Target","Common name","Uniprot ID","ChEMBL ID","Target Class","Probability” 1 Dipeptidyl peptidase IV,"DPP4","P27487","CHEMBL284","Protease","0.526361274524" 2 Angiotensin-converting enzyme,"ACE","P12821","CHEMBL1808","Protease","0.444639417769" 3 Cyclooxygenase-2,"PTGS2","P35354","CHEMBL230","Oxidoreductase","0.420066875222"
    [Show full text]
  • Identification and Validation of Genes Involved in Gastric Tumorigenesis
    Rajkumar et al. Cancer Cell International 2010, 10:45 http://www.cancerci.com/content/10/1/45 PRIMARY RESEARCH Open Access Identification and validation of genes involved in gastric tumorigenesis Thangarajan Rajkumar1*, Neelakantan Vijayalakshmi, Gopisetty Gopal, Kesavan Sabitha1, Sundersingh Shirley2, Uthandaraman M Raja 1, Seshadri A Ramakrishnan3 Abstract Background: Gastric cancer is one of the common cancers seen in south India. Unfortunately more than 90% are advanced by the time they report to a tertiary centre in the country. There is an urgent need to characterize these cancers and try to identify potential biomarkers and novel therapeutic targets. Materials and methods: We used 24 gastric cancers, 20 Paired normal (PN) and 5 apparently normal gastric tissues obtained from patients with non-gastric cancers (Apparently normal - AN) for the microarray study followed by validation of the significant genes (n = 63) by relative quantitation using Taqman Low Density Array Real Time PCR. We then used a custom made Quantibody protein array to validate the expression of 15 proteins in gastric tissues (4 AN, 9 PN and 9 gastric cancers). The same array format was used to study the plasma levels of these proteins in 58 patients with gastric cancers and 18 from patients with normal/non-malignant gastric conditions. Results: Seventeen genes (ASPN, CCL15/MIP-1δ, MMP3, SPON2, PRSS2, CCL3, TMEPAI/PMEPAI, SIX3, MFNG, SOSTDC1, SGNE1, SST, IGHA1, AKR1B10, FCGBP, ATP4B, NCAPH2) were shown to be differentially expressed between the tumours and the paired normal, for the first time. EpCAM (p = 0.0001), IL8 (p = 0.0003), CCL4/MIP-1b (p = 0.0026), CCL20/MIP-3a (p = 0.039) and TIMP1 (p = 0.0017) tissue protein levels were significantly different (Mann Whitney U test) between tumours versus AN & PN.
    [Show full text]