Environmental Factor Pairs
Total Page:16
File Type:pdf, Size:1020Kb
The Identification of Interacting Gene -Environmental Factor Pairs Chad Kimmel, MS,I,2,Jonathan Lustgarten, PhD3, An-Kwok Ian Wong, MS2, Steve Handler, MD, PhD1,2,Shyam Visweswaran, MD, PhD1,2 IDepartment of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 2The Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 3University of Pennsylvania School of Veterinary Medicine, Philadelphia, PA ABSTRACT on the corresponding gene, or the environmental factor and gene (along with the protein product) having a direct The etiological factors of disease include genetic and influence on the same biological molecule (ex. gene, environmental factors, and these factors may interact in protein, metabolite). An example of a biological causing disease. Genetic and environmental factors which relationship would be the increase or decrease in are associated with the same diseases are likely transcription for a given gene. candidates for interactions or associations. For example, a genetic factor and an environmental factor that have Several of the most similar environmental factor and gene several diseases in common may be related such that the pairs according to their diseases caused were found to environmental factor interacts with the gene. We have a biological relationship, and we then inferred that extracted gene-disease and environmental factor-disease the pairs which had no relationship may be good associations and computed a similarity score between candidates for a biological relationship due to their each gene and environmental factor pair based on shared commonly caused diseases. Furthermore, these pairs diseases. For several of the gene-environmental factor may be good candidates for a biological interaction in pairs, we found evidence for plausible biological their commonly caused diseases, since environmental relationships in the literature. We postulate that that the factors and genes which cause more similar diseases remaining gene-environmental factor pairs may have novel interactions. should be more likely to have a biological interaction. In this context, a biological interaction is defined as the Keywords: Environmental Factors, Genetic Factors, interdependent association between an environmental Interactions factor and gene in the causation of a disease. I. INTRODUCTION 2. METHODS Genetic Factors. We obtained the genetic factors of Diseases can be caused by a wide variety of both genes human diseases ftom the Genetic Association Database and environmental factors. Often times, these genes and environmental factors will interact in the pathology of a (GAD) which is a curated archive of human genetic disease. association studies of common diseases [I]. The GAD represents genes with NCBI Entrez Gene identifiers and diseases with Medical Subject Headings (MeSH) This paper presents a methodology to identify identifiers. environmental factors and genes which cause similar diseases and test for potential interactions. We first Environmental Factors. We obtained the extracted the genetic and environmental factors of disease environmental factors of disease ftom two sources. The ftom the genetic association to disease database (GAD) first source was the CHE Toxicant and Disease Database [I], the CHE Toxicant and Disease Database [2], and the [2] which is a curated database of chemical-disease Medical Subject Heading (MeSH) indexing of MedLine. associations. The second source was the MeSH Using the Jaccard similarity score, we then calculated the similarity in the diseases caused between every annotation of MedLine articles. We used patterns of environmental factor and genetic factor. The specific MeSH headings and sub-headings to infer environmental environmental factor and gene involved in the similarity factor to disease associations following the strategy used calculation were called a "pair." For validation, we then by Liu et al. [3]. investigated the most similar environmental factor and gene pairs according to their diseases caused for Similarity Score. We calculated the similarity biological relationships ftom the literature and Ingenuity between every genetic and environmental factor pair based on the number of diseases common to them. In (Ingenuity Systemso, www.Ingenuity.com) - a systems biology database. A biological relationship consisted of particular, we calculated the Jaccard similarity score for a either the environmental factor having a direct influence genetic factor G and environmental factor E using the folIowing expression: 3. RESULTS Sl1 jaccard(G,E) = ~.. .~.~.~M (I) We extracted a total of 8,872 disease-gene associations where 811 is the number of diseases associated with both between 889 diseases and 1,891 genes from GAD, and a total of 51,994 disease-environmental associations G and E, 810 is the number of diseases associated with G between 1,911 diseases and 5,801 environmental agents but not E, and 80 I is the number of diseases associated from the CHE database and MedLine. After the factors with E but not G. The Jaccard similarity measure varies with less than three diseases caused were excluded, the between 0 and 1 where 1 represents the greatest amount total number of environmental factors decreased to 2,459, of similarity and 0 represents the least amount of and the total number of genes decreased to 1,469. This similarity. Only factors which had at-least three diseases resulted in more than three and a half milIion caused were used in the subsequent analysis. environmental factor-gene pairs for which we computed a Jaccard score. Table 1 gives the environmental factor- Relationship Between Factors. We examined gene pairs that had a Jaccard similarity score of at least genetic - environmental factors with a Jaccard similarity 0.4 along with the common diseases caused. score of 0.4 or greater for plausible biological relationships in the literature and in Ingenuity (Ingenuity Systemso, www.Ingenuity.com). Ingenuity is a systems biology database that contains manualIy annotated relationships between biological agents such as chemicals, drugs, genes, and proteins. In Ingenuity, we searched for evidence that a gene in a pair is directly influenced by the environmental factor in the pair. We also searched for biological molecules (genes, metabolites, etc.) that may be related to both the gene and the environmental factor in a pair. Environmental Factor/Gene Pair Common Diseases Caused Aminovvridines/HLA-DOBI Asthma/Occupational Diseases Androstenes/Fl3B ThromboembolismN enous Thrombosis Antithrombins/F3 * Hemorrhage/Coronary Thrombosis Carbon Comvounds, Inorf!anic/LAPTM4B Lung Neoplasms/Stomach Neoplasms Carmine/HLA-DOBI * Asthma/Occupational Diseases Cellulase/CY8LTR2 Asthma/Rhinitis Cellulase/HLA-DaB I Asthma/Occupational Diseases Cellulase/IL4RI Asthma/Rhinitis Cereals/BIRC5 CelI Transformation, Neoplastic/Uterine Cervical Neoplasms Cereals/KIR3DL2 Cervical Intraepithelial NeoplasialUterine Cervical Neoplasms Cereals/KIR3DL3 Cervical Intraepithelial NeoplasialUterine Cervical Neoplasms Ef!f!Proteins/HLA-DQBI Asthma/Occupational Diseases Escin/HLA-DaBI Asthma/Occupational Diseases Fertility Agents/NB81 MelanomalBreast Neoplasms/Ovarian Neoplasms Skin Neoplasms Food, Formulated/DIOI Atrophy; Alzheimer Disease Glycyrrhizic Acid/CYP II BI * Hypertension/Hyperaldosteronism Meclizine/ LEF1 Cleft Lip/Cleft Palate Meclizine/BHMT2 Cleft Lip/Cleft Palate Meclizine/C6orfl05 Cleft Lip/Cleft Palate Meclizine/COL4A4 Cleft Lip/Cleft Palate Meclizine/DLX3 Cleft Lip/Cleft Palate Meclizine/GLI2 Cleft Lip/Cleft Palate Meclizine/HDAC4 Cleft Lip/Cleft Palate Meclizine/MLPH Cleft Lip/Cleft Palate Meclizine/SCN3B Cleft Lip/Cleft Palate Meclizine/SHH Cleft Lip/Cleft Palate Meclizine/SP 100 Cleft Lip/Cleft Palate Methenamine/HLA-DQB 1 Asthma/Occupational Diseases Methyl n-Butvl Ketone/A TP8Bl Cholestasis/Cholestasis, Intrahepatic Methylenebis(chloroaniline)/ZNF350 Urinary Bladder Neoplasms/Carcinoma, Transitional Cell Neurotransmitter Uptake Inhibitors/INPP 1 Stress Disorders, Post- Traumatic/Bipolar Disorder Ninhydrin/CYSL TR2 Asthma/Rhinitis Nitri/otriacetic Acid/RGS6 Lung Neoplasms/Urinary Bladder Neoplasms Noise/KCN04 Hearing Loss/Hearing Loss, Noise-Induced o-Phthalaldehyde/HLA -DQB 1 Asthma/Occupational Diseases Oxalates/CLCN5 * Nephrocalcinosis/Kidney Calculi/Kidney Diseases Pavain/IL4Rl Coniunctivitis; AsthmalRhinitis Parabens/CYP24A 1 Breast Neoplasms/Asthma Parabens/KDR Breast Neoplasms/Asthma Pectins/HLA-DOBl Asthma/Occupational Diseases Pentylenetetrazole/KCNMB3 Epilepsies, Myoc1onic/Epilepsy, Generalized! Epilepsy, Absence Pesticide Syner9:ists/CXCLl4 Carcinoma, Hepatocellular/Liver Neoplasms Pesticide Syner9:ists/PTK2 Carcinoma, Hepatocellular/Liver Neoplasms Pesticide Svnen!ists/SMYD3 * Carcinoma, Hepatocellular/Liver Neoplasms Pesticide Syner9:ists/UGTl A4 Carcinoma, Hepatocellular/Liver Neoplasms Pesticide Svner9:ists/UGTlA8 Carcinoma, Hepatocellular/Liver Neoplasms Phenvlurea Comvounds/HSDI 1B2 Diabetic Nephropathies/Diabetes Mellitus, Type I Phenylurea Compounds/TCF2 Diabetic Nephropathies/Diabetic Neuropathies/ Diabetes Mellitus, Type I Piperonyl Butoxide /CXCLl4 Carcinoma, Hepatocellular/Liver Neoplasms Piperonyl Butoxide/SMYD3 * Carcinoma, Hepatocellular/Liver Neoplasms Pipobroman /CSF3R Leukemia/Myelodysplastic Syndromes/Anemia, Aplastic Pipobroman/CSF3R Leukemia/Myelodysplastic Syndromes/ Anemia, Aplastic Platelet-Derived Growth Factor /CCNH Mouth Neoolasms/Precancerous Conditions Potassium Channel Blockers KCNJ2* Long QT Syndrome/Arrhythmias, Cardiac Potassium Channel