The Identification of Interacting -Environmental Factor Pairs

Chad Kimmel, MS,I,2,Jonathan Lustgarten, PhD3, An-Kwok Ian Wong, MS2, Steve Handler, MD, PhD1,2,Shyam Visweswaran, MD, PhD1,2 IDepartment of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 2The Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 3University of Pennsylvania School of Veterinary Medicine, Philadelphia, PA

ABSTRACT on the corresponding gene, or the environmental factor and gene (along with the product) having a direct The etiological factors of disease include genetic and influence on the same biological molecule (ex. gene, environmental factors, and these factors may interact in protein, metabolite). An example of a biological causing disease. Genetic and environmental factors which relationship would be the increase or decrease in are associated with the same diseases are likely transcription for a given gene. candidates for interactions or associations. For example, a genetic factor and an environmental factor that have Several of the most similar environmental factor and gene several diseases in common may be related such that the pairs according to their diseases caused were found to environmental factor interacts with the gene. We have a biological relationship, and we then inferred that extracted gene-disease and environmental factor-disease the pairs which had no relationship may be good associations and computed a similarity score between candidates for a biological relationship due to their each gene and environmental factor pair based on shared commonly caused diseases. Furthermore, these pairs diseases. For several of the gene-environmental factor may be good candidates for a biological interaction in pairs, we found evidence for plausible biological their commonly caused diseases, since environmental relationships in the literature. We postulate that that the factors and which cause more similar diseases remaining gene-environmental factor pairs may have novel interactions. should be more likely to have a biological interaction. In this context, a biological interaction is defined as the Keywords: Environmental Factors, Genetic Factors, interdependent association between an environmental Interactions factor and gene in the causation of a disease.

I. INTRODUCTION 2. METHODS Genetic Factors. We obtained the genetic factors of Diseases can be caused by a wide variety of both genes human diseases ftom the Genetic Association Database and environmental factors. Often times, these genes and environmental factors will interact in the pathology of a (GAD) which is a curated archive of human genetic disease. association studies of common diseases [I]. The GAD represents genes with NCBI Gene identifiers and diseases with Medical Subject Headings (MeSH) This paper presents a methodology to identify identifiers. environmental factors and genes which cause similar diseases and test for potential interactions. We first Environmental Factors. We obtained the extracted the genetic and environmental factors of disease environmental factors of disease ftom two sources. The ftom the genetic association to disease database (GAD) first source was the CHE Toxicant and Disease Database [I], the CHE Toxicant and Disease Database [2], and the [2] which is a curated database of chemical-disease Medical Subject Heading (MeSH) indexing of MedLine. associations. The second source was the MeSH Using the Jaccard similarity score, we then calculated the similarity in the diseases caused between every annotation of MedLine articles. We used patterns of environmental factor and genetic factor. The specific MeSH headings and sub-headings to infer environmental environmental factor and gene involved in the similarity factor to disease associations following the strategy used calculation were called a "pair." For validation, we then by Liu et al. [3]. investigated the most similar environmental factor and gene pairs according to their diseases caused for Similarity Score. We calculated the similarity biological relationships ftom the literature and Ingenuity between every genetic and environmental factor pair based on the number of diseases common to them. In (Ingenuity Systemso, www.Ingenuity.com) - a systems biology database. A biological relationship consisted of particular, we calculated the Jaccard similarity score for a either the environmental factor having a direct influence genetic factor G and environmental factor E using the folIowing expression: 3. RESULTS Sl1 jaccard(G,E) = ~.. .~.~.~M (I) We extracted a total of 8,872 disease-gene associations where 811 is the number of diseases associated with both between 889 diseases and 1,891 genes from GAD, and a total of 51,994 disease-environmental associations G and E, 810 is the number of diseases associated with G between 1,911 diseases and 5,801 environmental agents but not E, and 80 I is the number of diseases associated from the CHE database and MedLine. After the factors with E but not G. The Jaccard similarity measure varies with less than three diseases caused were excluded, the between 0 and 1 where 1 represents the greatest amount total number of environmental factors decreased to 2,459, of similarity and 0 represents the least amount of and the total number of genes decreased to 1,469. This similarity. Only factors which had at-least three diseases resulted in more than three and a half milIion caused were used in the subsequent analysis. environmental factor-gene pairs for which we computed a Jaccard score. Table 1 gives the environmental factor- Relationship Between Factors. We examined gene pairs that had a Jaccard similarity score of at least genetic - environmental factors with a Jaccard similarity 0.4 along with the common diseases caused. score of 0.4 or greater for plausible biological relationships in the literature and in Ingenuity (Ingenuity Systemso, www.Ingenuity.com). Ingenuity is a systems biology database that contains manualIy annotated relationships between biological agents such as chemicals, drugs, genes, and . In Ingenuity, we searched for evidence that a gene in a pair is directly influenced by the environmental factor in the pair. We also searched for biological molecules (genes, metabolites, etc.) that may be related to both the gene and the environmental factor in a pair.

Environmental Factor/Gene Pair Common Diseases Caused Aminovvridines/HLA-DOBI Asthma/Occupational Diseases Androstenes/Fl3B ThromboembolismN enous Thrombosis Antithrombins/F3 * Hemorrhage/Coronary Thrombosis Carbon Comvounds, Inorf!anic/LAPTM4B Lung Neoplasms/Stomach Neoplasms Carmine/HLA-DOBI * Asthma/Occupational Diseases Cellulase/CY8LTR2 Asthma/Rhinitis Cellulase/HLA-DaB I Asthma/Occupational Diseases Cellulase/IL4RI Asthma/Rhinitis Cereals/BIRC5 CelI Transformation, Neoplastic/Uterine Cervical Neoplasms Cereals/KIR3DL2 Cervical Intraepithelial NeoplasialUterine Cervical Neoplasms Cereals/KIR3DL3 Cervical Intraepithelial NeoplasialUterine Cervical Neoplasms Ef!f!Proteins/HLA-DQBI Asthma/Occupational Diseases Escin/HLA-DaBI Asthma/Occupational Diseases Fertility Agents/NB81 MelanomalBreast Neoplasms/Ovarian Neoplasms Skin Neoplasms Food, Formulated/DIOI Atrophy; Alzheimer Disease Glycyrrhizic Acid/CYP II BI * Hypertension/Hyperaldosteronism Meclizine/ LEF1 Cleft Lip/Cleft Palate Meclizine/BHMT2 Cleft Lip/Cleft Palate Meclizine/C6orfl05 Cleft Lip/Cleft Palate Meclizine/COL4A4 Cleft Lip/Cleft Palate Meclizine/DLX3 Cleft Lip/Cleft Palate Meclizine/GLI2 Cleft Lip/Cleft Palate Meclizine/HDAC4 Cleft Lip/Cleft Palate Meclizine/MLPH Cleft Lip/Cleft Palate Meclizine/SCN3B Cleft Lip/Cleft Palate Meclizine/SHH Cleft Lip/Cleft Palate Meclizine/SP 100 Cleft Lip/Cleft Palate Methenamine/HLA-DQB 1 Asthma/Occupational Diseases Methyl n-Butvl Ketone/A TP8Bl Cholestasis/Cholestasis, Intrahepatic Methylenebis(chloroaniline)/ZNF350 Urinary Bladder Neoplasms/Carcinoma, Transitional Cell Neurotransmitter Uptake Inhibitors/INPP 1 Stress Disorders, Post- Traumatic/Bipolar Disorder Ninhydrin/CYSL TR2 Asthma/Rhinitis Nitri/otriacetic Acid/RGS6 Lung Neoplasms/Urinary Bladder Neoplasms Noise/KCN04 Hearing Loss/Hearing Loss, Noise-Induced o-Phthalaldehyde/HLA -DQB 1 Asthma/Occupational Diseases Oxalates/CLCN5 * Nephrocalcinosis/Kidney Calculi/Kidney Diseases Pavain/IL4Rl Coniunctivitis; AsthmalRhinitis Parabens/CYP24A 1 Breast Neoplasms/Asthma Parabens/KDR Breast Neoplasms/Asthma Pectins/HLA-DOBl Asthma/Occupational Diseases Pentylenetetrazole/KCNMB3 Epilepsies, Myoc1onic/Epilepsy, Generalized! Epilepsy, Absence Pesticide Syner9:ists/CXCLl4 Carcinoma, Hepatocellular/Liver Neoplasms Pesticide Syner9:ists/PTK2 Carcinoma, Hepatocellular/Liver Neoplasms Pesticide Svnen!ists/SMYD3 * Carcinoma, Hepatocellular/Liver Neoplasms Pesticide Syner9:ists/UGTl A4 Carcinoma, Hepatocellular/Liver Neoplasms Pesticide Svner9:ists/UGTlA8 Carcinoma, Hepatocellular/Liver Neoplasms Phenvlurea Comvounds/HSDI 1B2 Diabetic Nephropathies/Diabetes Mellitus, Type I Phenylurea Compounds/TCF2 Diabetic Nephropathies/Diabetic Neuropathies/ Diabetes Mellitus, Type I Piperonyl Butoxide /CXCLl4 Carcinoma, Hepatocellular/Liver Neoplasms Piperonyl Butoxide/SMYD3 * Carcinoma, Hepatocellular/Liver Neoplasms Pipobroman /CSF3R Leukemia/Myelodysplastic Syndromes/Anemia, Aplastic Pipobroman/CSF3R Leukemia/Myelodysplastic Syndromes/ Anemia, Aplastic Platelet-Derived Growth Factor /CCNH Mouth Neoolasms/Precancerous Conditions Blockers KCNJ2* Long QT Syndrome/Arrhythmias, Cardiac Potassium Channel Blockers/CA V3* Long QT Svndrome/ Arrhythmias, Cardiac Potassium Channel Blockers/KCNE2* Long QT Syndrome/Arrhythmias, Cardiac Torsades de Pointes Prost a9:landin-Endoperox ide Synthases/lRAK3 Pouchitis/Crohn Disease/Colitis, Ulcerative Psoralens/AP AF 1 Melanoma/Skin Neoplasms Sodium Azide/PRND Nervous System Diseases/Alzheimer Disease Sodium Salicvlate/KCNQ4* Deafness/Hearing Loss Tartrazine/HRH2 Angioedema/ AsthmalUrticaria Thionucleotides/RP2 Retinitis Pigmentosa; Retinal Diseases Tuberculin/HRH 1 AngioedemalUrticaria

Table 1. Some of the gene and environmental factor pairs with the most similar scores. The pairs with a star were found to have a biological relationship. and environmental factors of disease. Among the top We found plausible biological basis for several of the scoring gene-environmental factor pairs on the gene-environmental factor pairs. For example, for similarity score, we found several pairs with the sodium salicylate -KCNQ4 gene pair, Wu et a\. evidence in the literature to support an association. [4] showed that salicytate blocks the action of the The pairs for which we found no existing KCNQ4 gene resulting in hearing loss. Furthermore, relationships may be candidates for further the protein antithrombin inhibits the protein complex investigation for a biological interaction in the consisting of the coagulation factor genes F3 and F7 causation of their common diseases. [5]. This indicates a possible interaction between Antithrombin together with the genes F3 and F7 in References the pathology of Coronary Thrombosis. [I] Becker, K.G., et a\., The genetic association database. Nat Genet, 2004. 36(5): p. 431-2. Several of the gene-environmental factor pairs were [2] Davis, A.P., et aI., Comparative related to the same biological molecule. For Toxicogenomics Database: a instance, the pesticide synergist Piperonyl Butoxide knowledgebase and discovery toolfor and the Smyd3 gene both influence the Myc chemical-gene-disease networks. Nuc1eic oncogene. Piperonyl Butoxide has been found to Acids Res, 2009. 37(Database issue): p. increase the activation of the Myc Gene [6], and the D786-92. interference of human Smyd3 microRNA has been [3] Liu, Y.I., P.H. Wise, and AJ. Butte, The found to decrease the binding of the promoter "etiome": identification and clustering of between the Tert gene and the Myc protein [7]. human disease etiologicalfactors. BMC Given the importance of the Myc gene in cancer, Bioinformatics, 2009.10 Suppl2: p. S14. there could be a possible interaction between [4] Wu, T., et aI., Effect of Salicylate on Piperonyl Butoxide and the SMYD3 gene in liver cancer. KCNQ4 of the Guinea Pig Outer Hair Cell. J Neurophysiol, 20 IO. [5] Dellinger, R.P., Inflammation and It is interesting to note that the HLA-DQBI gene and coagulation: implicationsfor the septic the anti-histamine Meclizine was present in eight and patient. Clin Infect Dis, 2003. 36(\0): p. fourteen of the top 63 environmental factor and gene 1259-65. pairs respectfully. This finding indicates that the [6] Kawai, M., et a\., Elevation of cell HLA-DQBI gene and Meclizine share a large proliferation via generation of reactive number of diseases caused in common with many oxygen species bypiperonyl butoxide other environmental factors and genes respectfully. contributes to its liver tumor-promoting This may be because the pathways which are effects in mice. Arch Toxicol, 2010. 84(2): perturbed by HLA-DQB1 and Meclizine are similar p. 155-64. to the ones perturbed by the other genes and [7] Liu, C., et aI., The telomerase reverse environmental factors. Unfortunately, the annotation transcriptase (hTERT) gene is a direct coverage of pathways perturbed by genes and target of the histone methyltransferase environmental factors from databases like the Kyoto SMYD3. Cancer Res, 2007. 67(6): p. 2626- Encyclopedia of Genes and Genomes (KEGG) [8] is 31. not complete and thus no pathway association could be made. [8] Kanehisa, M. and S. Goto, KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res, 2000. 28(\): p. 27-30. Furthermore, even among the highest scoring pairs in which a biological relationship could be found, there were no papers from the literature which linked an environmental factor and gene in the pathology of a disease. For instance, the respective environmental factor and gene pair may perturb similar biological pathways or the environmental factor may attenuate the mutation of a given gene in the pathology of their common diseases. These potential interactions may also serve as candidates for future experimentation. 4. CONCLUSIONS

We presented a method to identify gene- environmental factor pairs that may have plausible interactions from knowledge about genetic factors