The Title of the Article
Total Page:16
File Type:pdf, Size:1020Kb
Identifying Interacting Environmental Factor – Gene Pairs Chad Kimmel, MS1, Jonathan Lustgarten, PhD3, An-Kwok Ian Wong, MS2, Shyam Visweswaran, MD, PhD1,2 1Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 2Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 3 University of Pennsylvania School of Veterinary Medicine, Philadelphia, PA ABSTRACT underlying the disease (based on the already characterized mechanisms in the similar diseases) or in investigating Most diseases are caused by a combination of new therapies (based on the therapies in use in the similar environmental and genetic etiological factors, and these diseases). factors may interact in causing disease. We conjectured that an environmental factor and a gene that are Several papers have described innovative methods for associated with the same set of diseases are likely to extracting environmental factors of disease and in interact. We extracted environmental factor – disease analyzing environmental factors in combination with associations and gene – disease associations from freely genetic factors. Liu et. al. extracted a comprehensive list accessible online databases and the research literature, of associations between disease and environmental factors and identified environmental factor – gene pairs that were using the Medical Subject Headings (MeSH) annotations similar in being associated with a common set of diseases. of MEDLINE articles and combined it with genetic For several of these pairs that we examined, we found factors of disease to characterize the “etiome” profile of evidence for plausible biological interactions in the over 800 diseases [3]. Patel et. al. developed a new literature. We postulate that that the remaining pairs may method called an Environment-Wide Association Study represent novel interactions between an environmental (EWAS) - similar to a Genome Wide Association Study factor and a gene. (GWAS) - to extract environmental factors of disease and applied it to Diabetes mellitus type 2 [4]. Gohlke et. al. INTRODUCTION extracted environmental factors of disease by identifying key molecular pathways that are jointly associated with The etiological factors associated with the development of genetic and environmental factors using a gene-centric human disease are broadly categorized into environmental database, and validated their new-found associations with and genetic factors. While infections like community- known chemical-disease relationships and transcriptional acquired pneumonia are predominantly influenced by regulation data [5]. We followed the strategies used by environmental factors and Mendelian diseases like sickle Liu et. al. to compile datasets of environmental factor – cell anemia are predominantly influenced by genetic disease and gene – disease associations. factors, many of the common diseases like coronary heart disease are influenced by both environmental and genetic We conjectured that an environmental factor and a gene factors that likely interact with each other. that are associated with the same set of diseases are likely to have a biological interaction. In this paper, we Several freely accessible online databases are now computed a similarity measure for an environmental available that collate information on genetic factors factor and a gene. We then investigated the most similar associated with disease, such as the Online Mendelian environmental factor - gene pairs for plausible biological Inheritance in Man (OMIM), the Genetic Association interactions. We defined an interaction as either the Database [1], and GeneCards. Fewer sources of environmental factor having a direct influence on the information are available on environmental factors gene, or the environmental factor and the gene (along associated with diseases. A freely accessible database is with the protein product) having a direct influence on a the CHE Toxicant and Disease Database [2] that contains common biological molecule (e.g., gene, protein, curated information on chemicals/toxins associated with metabolite). disease, and another source is the dataset provided by Liu et. al. in their paper on the “etiome” [3]. METHODS Collated data on environmental factor – disease In this section, we briefly describe the extraction of associations and gene – disease associations can provide a environmental and genetic factors of diseases, the wealth of information. For example, identifying similarity measure we used and the method we followed etiologically similar diseases to a disease of interest may for the selection of promising environmental factor – gene be useful in unraveling the biological mechanisms pairs. Environmental Factors. We obtained environmental Environmental Factor – Gene Pairs. We examined factors of human diseases from two sources. We obtained environmental factor – gene pairs with a Jaccard chemical-disease associations from the CHE Toxicant and similarity measure of 0.4 or greater for plausible Disease Database [2]. In addition, we extracted interactions between members of the pair. We chose a environmental factor – disease associations from the threshold of 0.4 rather arbitrarily so that we had a MeSH annotations of MEDLINE articles using the manageable number of high scoring environmental factor strategies developed by Liu et. al. [3]. MeSH is a – gene pairs to examine. We used the research literature comprehensive controlled vocabulary that contains over and Ingenuity (Ingenuity Systems©, www.ingenuity.com) 25,000 descriptors (also known as subject headings) and to identify interactions. Ingenuity is a systems biology over 80 qualifiers (also known as subheadings). An article database that contains manually annotated relationships in MEDLINE is typically annotated with several between biological agents such as chemicals, drugs, descriptor/qualifier pairs. For example, "peptic ulcer" is a genes, and proteins. In Ingenuity, we searched for descriptor and "chemically induced" is a qualifier and evidence that the environmental factor in a pair directly "peptic ulcer/chemically induced" describes articles on influences the gene. We also searched for biological peptic ulcer that are chemically induced (e.g., by a drug molecules (genes, metabolites, etc.) that are related to like indomethacin). Liu et al. identified several patterns of both the environmental factor and the gene in a pair. pairs of MeSH qualifiers to infer environmental factor → disease associations. For example, the association RESULTS indomethacin → peptic ulcer is induced from the We extracted 51,994 disease-environmental associations following two annotations of an article: peptic ulcer / between 1,911 diseases and 5,801 environmental agents chemically induced and indomethacin / adverse effect from the CHE Toxicant and Disease Database and (MeSH descriptors are in bold and MeSH qualifiers are in MEDLINE. We extracted 8,872 disease-gene associations italics). between 889 diseases and 1,891 genes from the GAD. The CHE Toxicant and Disease Database and MEDLINE Genetic Factors. We obtained genetic factors of human annotations were downloaded in May 2009. After diseases from the Genetic Association Database (GAD) etiological factors with less than three associated diseases which contains gene-disease associations that have been were excluded the number of environmental factors curated from the literature [1]. The GAD uses decreased to 2,459 and the number of genes decreased to standardized identifiers for genes (NCBI Entrez Gene 1,469. This resulted in more than three and a half million identifiers) and diseases (Medical Subject Headings environmental factor – gene pairs for which we computed (MeSH) identifiers) that enable easy machine processing. the Jaccard similarity measure. Similarity Measure. A similarity measure indicates the We identified a total of 63 environmental factor – gene strength of commonality between a pair of entities (e.g., pairs with a similarity of measure of 0.4 or greater. Table an environmental factor and a genetic factor) based on 1 gives a list of the 63 pairs and the common associated their properties (e.g., associated diseases). We used the diseases. The maximum number of common diseases for Jaccard similarity measure to calculate the similarity a pair was three. between an environmental factor E and a genetic factor G, as follows: We found evidence for an interaction for 10 of the 63 environmental factor – gene pairs. We now summarize ( ) some of this evidence. For the sodium salicylate – KCNQ4 gene pair, Wu et al. [6] showed that salicytate blocks the action of the KCNQ4 gene resulting in hearing where S11 is the number of diseases associated with both loss. For the antithrombins – F3 pair, antithrombin G and E, S10 is the number of diseases associated with E inhibits the complex consisting of coagulation factors III but not G, and S01 is the number of diseases associated and VII that are produced by genes F3 and F7 [7]. with G but not E. The Jaccard similarity measure varies between 0 and 1 where 1 denotes that E and G are Several of the environmental factor – gene pairs had an associated with exactly the same set of diseases, and 0 influence on the same biological molecule. For instance, denotes that E and G have no common associated in the pair piperonyl butoxide – SMYD3, both piperonyl diseases. We computed the Jaccard similarity measure butoxide, and SMYD3 have an influence on the c-Myc only for those genetic and environmental factors that were oncogene which has been implicated in the