A Thesis Data Pooling to Identify Differentially Expressed Genes In

A Thesis Data Pooling to Identify Differentially Expressed Genes In

A Thesis entitled Data Pooling to Identify Differentially Expressed Genes in Lung Cancer of Nonsmokers by Nicole Carr Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Biomedical Sciences ______________________________________ Dr. Sadik A. Khuder, Committee Chair ______________________________________ Dr. Barbara Saltzman, Committee Member ______________________________________ Dr. Alexei Fedorov, Committee Member ______________________________________ Dr. Patricia R. Komuniecki, Dean College of Graduate Studies The University of Toledo May, 2016 Copyright 2016, Nicole M. Carr This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author. An Abstract of Data Pooling to Identify Differentially Expressed Genes in Lung Cancer of Nonsmokers by Nicole Carr Submitted to the graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Biomedical Science The University of Toledo May 2016 Lung cancer is the leading cause of cancer deaths in the United States. While cigarette smoking is the principle causal agent of lung cancer, 10–15% of patients have no history of smoking, yet still manifest gene expression similar to smokers. Although microarrays have been applied widely to lung cancer, few studies have explored the specific expression differences between smokers and nonsmokers. The purpose of this study was to identify differentially expressed genes in lung cancer of nonsmokers by combining data from different microarray platforms using quantile transformation. Two statistical approaches were introduced in this study: the ordinal logistic regression and a new statistical test based on the DISCO Normal distribution. Results from these models were corroborated against the finding of the modified t-tests. The top reported genes were shown to play multiple differing roles in the cellular environment, including muscle maintenance, DNA damage repair mechanisms, cellular metabolism, proliferation and growth, and tumor progression. The results of this study give further insight to the genetic differences observed in lung cancer patients, as well as demonstrating the opportunity to combine data from different microarray platforms. iii Acknowledgements I would like to extend a most sincere thank you to my mentor Dr. Sadik Khuder for all he has taught me. Without his kindness, patience, knowledge, and experience, this project would not have been possible. He’s an ordinary person, with an extraordinary heart. Thank you to Dr. Barbara Saltzman, for not only her participation as a committee member but for her unparalleled kindness and sincere guidance through my academic career. Her words of wisdom will never be forgotten. Additional thanks to Dr. Alexei Fedorov, again not only for being a committee member, but for always driving me towards success and inspiring me to learn. I feel sincerely privileged to have had him as a professor. Many thanks as well to Dr. Sjaak Philipsen, of the Erasmus University Medical Center Rotterdam, for supplying phenotype information beyond what was publically available for his study. My deepest gratitude to Dr. Bob Blumenthal and Jo Anne Gray. Without the two of them, their understanding, their perseverance, and their dedication, I may never have made it through the program. I am forever indebted to their kindness. Finally, sincere thanks to my family and Nathanial Carter. Their endless love and support has always inspired me to strive for the highest. iv Table of Contents An Abstract of .................................................................................................................... iii Acknowledgements ............................................................................................................ iv Table of Contents ................................................................................................................ v List of Tables .................................................................................................................... vii List of Figures .................................................................................................................. viii 1 Introduction ...................................................................................................................... 1 2 Background ...................................................................................................................... 4 2.1 Histologies of Lung Cancer....................................................................................... 4 2.2 Causes of Lung Cancer ............................................................................................. 5 2.3 Familial Lung Cancer ................................................................................................ 5 2.4 Microarray Analysis .................................................................................................. 6 2.5 DNA Hybridization ................................................................................................... 6 2.6 Types of Microarrays ................................................................................................ 7 2.7 Cross Validation Studies ........................................................................................... 8 2.8. Statistical Methods ................................................................................................... 9 2.8.1 DISCO Normal Distribution .................................................................................. 9 2.8.2 Ordinal Logistic Regression ................................................................................. 10 3 Methods.......................................................................................................................... 11 3.1 Data Collection ........................................................................................................ 11 v 3.1.1 Tumor Tissue Data ............................................................................................... 11 3.1.2 Tumor and Adjacent Normal Tissue Data............................................................ 12 3.2 Data Analysis .......................................................................................................... 13 3.3 Data Comparison ..................................................................................................... 13 3.4 Quantile Scoring Method: ....................................................................................... 14 4 Results ............................................................................................................................ 18 4.1 Individual Platform Results ..................................................................................... 18 4.2 Quantile Combination: Non-Smokers ..................................................................... 19 4.3 Quantile Combination: Smokers ............................................................................. 21 4.4 Quantile Combination: Tumor Samples .................................................................. 24 4.5 Common Non-Smoker / Smoker Results ................................................................ 26 5 Discussion ...................................................................................................................... 33 6 Conclusion ..................................................................................................................... 40 References .......................................................................... Error! Bookmark not defined. A Supplementary Tables and Figures .............................................................................. 48 B R-Codes ................................................................................................................................... 77 vi List of Tables Table 4.1: Top 25 common DEGs from Non-Smoker samples. ....................................... 20 Table 4.2 Top 25 common DEGS from smoker samples ................................................. 23 Table 4.3: Top 25 common DEGs from tumor samples. .................................................. 25 Table 4.4: Common DEGs from smoker and non-smoker results. ................................... 27 Table A.1: List of Microarray analysis used in this study ................................................ 48 Table A.2: Common 407 DEGs from non-smoker data pool ........................................... 55 Table A.3: 493 Common DEGs from smoker data pool .................................................. 61 Table A.4: Top 100 Common DEGs from tumor samples ............................................... 74 vii List of Figures Figure 3-1: Example Venn diagram. ................................................................................. 14 Figure 3-2: Schematic representation of Quantile Transformation technique. ................. 16 Figure 4-1: Relationship of statistical results for non-smokers ........................................ 19 Figure 4-2: Relationship of statistical results for smokers................................................ 22 Figure 4-3: Relationship of statistical for tumor samples ................................................. 24 Figure A-1: HGU133A platform merge results. ............................................................... 50 Figure A-2: HGU133A platform merge results. ..............................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    91 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us