Models of the Gene Must Inform Data-Mining Strategies in Genomics
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Review of Data Mining Techniques in Medicine
JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT, Vol. 14.1, May 2020 A REVIEW OF DATA MINING TECHNIQUES IN MEDICINE Ionela-Cătălina ZAMFIR1 Ana-Maria Mihaela IORDACHE2 Abstract: Data mining techniques found applications in many areas since their development. New techniques or combination of techniques are created continuously. Techniques like supervised or unsupervised learning are used for prediction different diseases and have the aim to identify (diagnosis) the disease and predict the incidence or make predictions about treatment and survival rate. This paper presents the new trends in applying data mining in healthcare area, taking into account the most relevant papers in this respect. By analyzing both applied and review articles, using text mining on keywords and abstracts, it is revealing the most used methodologies (Support Vector Machines, Artificial Neural Networks, K-Means Algorithm, Decision Trees, Logistic Regression) as well as the areas of interest (prediction of different diseases, like: breast cancer, lung cancer, heart diseases, diabetes, thyroid or kidney diseases). Keywords: data mining, predictions, literature review, disease, healthcare, medicine JEL classification: I15, C38 1. Introduction Data Mining techniques know a wide spread and applicability nowadays. Among the areas that use these techniques, the medical and economic fields have the most interesting applications. In economic area, the most studied issues are: the prediction of bankruptcy risk for companies, the prediction of fraud in insurance, fraud in financial statements, fraud in credit card transactions, or the customers’ classification for targeted marketing campaign. In medical field, some of the most approached issues are: the identification of cancer (breast, lung, or other types), hearth diseases, diabetes, skin disease or the prediction of different diseases incidence. -
Scientific Data Mining in Astronomy
SCIENTIFIC DATA MINING IN ASTRONOMY Kirk D. Borne Department of Computational and Data Sciences, George Mason University, Fairfax, VA 22030, USA [email protected] Abstract We describe the application of data mining algorithms to research prob- lems in astronomy. We posit that data mining has always been fundamen- tal to astronomical research, since data mining is the basis of evidence- based discovery, including classification, clustering, and novelty discov- ery. These algorithms represent a major set of computational tools for discovery in large databases, which will be increasingly essential in the era of data-intensive astronomy. Historical examples of data mining in astronomy are reviewed, followed by a discussion of one of the largest data-producing projects anticipated for the coming decade: the Large Synoptic Survey Telescope (LSST). To facilitate data-driven discoveries in astronomy, we envision a new data-oriented research paradigm for astron- omy and astrophysics – astroinformatics. Astroinformatics is described as both a research approach and an educational imperative for modern data-intensive astronomy. An important application area for large time- domain sky surveys (such as LSST) is the rapid identification, charac- terization, and classification of real-time sky events (including moving objects, photometrically variable objects, and the appearance of tran- sients). We describe one possible implementation of a classification broker for such events, which incorporates several astroinformatics techniques: user annotation, semantic tagging, metadata markup, heterogeneous data integration, and distributed data mining. Examples of these types of collaborative classification and discovery approaches within other science disciplines are presented. arXiv:0911.0505v1 [astro-ph.IM] 3 Nov 2009 1 Introduction It has been said that astronomers have been doing data mining for centuries: “the data are mine, and you cannot have them!”. -
A Proposed Data Mining Methodology and Its Application to Industrial Engineering
University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Masters Theses Graduate School 8-2002 A Proposed Data Mining Methodology and its Application to Industrial Engineering Jose Solarte University of Tennessee - Knoxville Follow this and additional works at: https://trace.tennessee.edu/utk_gradthes Part of the Other Engineering Commons Recommended Citation Solarte, Jose, "A Proposed Data Mining Methodology and its Application to Industrial Engineering. " Master's Thesis, University of Tennessee, 2002. https://trace.tennessee.edu/utk_gradthes/2172 This Thesis is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Masters Theses by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a thesis written by Jose Solarte entitled "A Proposed Data Mining Methodology and its Application to Industrial Engineering." I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Science, with a major in Industrial Engineering. Denise F. Jackson, Major Professor We have read this thesis and recommend its acceptance: Robert E. Ford, Tyler A. Kress Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) To the Graduate Council: I am submitting herewith a thesis written by Jose Solarte entitled “A Proposed Data Mining Methodology and its Application to Industrial Engineering.” I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Science, with a major in Industrial Engineering. -
Discovering Knowledge in Data: an Introduction to Data Mining, Introduces the Reader to This Rapidly Growing field of Data Mining
WY045-FM September 24, 2004 10:2 DISCOVERING KNOWLEDGE IN DATA An Introduction to Data Mining DANIEL T. LAROSE Director of Data Mining Central Connecticut State University A JOHN WILEY & SONS, INC., PUBLICATION iii WY045-FM September 24, 2004 10:2 vi WY045-FM September 24, 2004 10:2 DISCOVERING KNOWLEDGE IN DATA i WY045-FM September 24, 2004 10:2 ii WY045-FM September 24, 2004 10:2 DISCOVERING KNOWLEDGE IN DATA An Introduction to Data Mining DANIEL T. LAROSE Director of Data Mining Central Connecticut State University A JOHN WILEY & SONS, INC., PUBLICATION iii WY045-FM September 24, 2004 10:2 Copyright ©2005 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. -
Data Mining Techniques in Diagnosis of Chronic Diseases
Journal of Applied Technology and Innovation (eISSN: 2600-7304) vol. 1, no. 2, (2017), pp. 10-27 Data Mining Techniques in Diagnosis of Chronic Diseases Keerthana Rajendran Faculty of Computing, Engineering & Technology Asia Pacific University of Technology & Innovation 57000 Kuala Lumpur, Malaysia Email: [email protected] Abstract - Chronic diseases and cancer are raising health concerns globally due to lower chances of survival when encountered with any of these diseases. The need to implement automated data mining techniques to enable cost-effective and early diagnosis of various diseases is fast becoming a trend in healthcare industry. The optimal techniques for prediction and diagnosis vary between different chronic diseases and the disease related-parameters under study. This review article provides a holistic view of the types of machine learning techniques that can be used in diagnosis and prediction of several chronic diseases such as diabetes, cardiovascular and brain diseases, chronic kidney disease and a few types of cancers, namely breast, lung and brain cancers. Overall, the computer-aided, automatic data mining techniques that are commonly employed in diagnosis and prognosis of chronic diseases include decision tree algorithms, Naïve Bayes, association rule, multilayer perceptron (MLP), Random Forest and support vector machines (SVM), among others. As the accuracy and overall performance of the classifiers differ for every disease, this article provides a mean to understand the ideal machine learning techniques for prediction of several well-known chronic diseases. Index Terms - Data mining, Healthcare systems, Machine Learning, Big data analytics Introduction he evolution of healthcare industry from the traditional healthcare system to the T utilization of Electronic Health Records (EHR) system has introduced the concept of big data in the healthcare sector. -
Epigenetics Is Normal Science, but Don't Call It Lamarckian
Extended Abstract Volume 6:3, 2020 DOI: 6.2472/imedipce.2020.6.002 Journal of Clinical Epigenetics ISSN: 2472-1158 Open Access Epigenetics is Normal Science, But Don’t Call It Lamarckian David Penny New Zealand Extended Abstract they are general (but not necessarily universal) because they are found in Excavates, which many people consider are ancestral to other eukaryotes [6, 7]. Similarly, an important point is that Epigenetic mechanisms are increasingly being accepted as an integral methylation, phosphorylation, etc. of the DNA and of the histones is a part of normal science. Earlier genetic studies appear to have part of normal evolution (at least in eukaryotes that have the assumed that the only changes were ‘mutations’ as a result of a nucleosome structure) [8]. One of the most interesting aspects is change in the DNA sequence. If any such changes altered the amino methylation (and hydroxymethylation) of cytosine, and this is found in acid sequences then it was effectively a mutation, and if it occurred in many deep branching eukaryotes [9]. Typically there is methylation of a germline cell (including in unicellular organisms) then it could be cytosine molecules, especially in CpG (cytosine followed by guanine) passed on to subsequent generations. However, it is increasingly regions. The role of small RNAs in helping regulate the levels of apparent that other changes are also important, for example, changes protein expression is discussed in [10]. The main point here is that to methylation patterns, and to the histone proteins. Epigenetics was protein expression levels are very variable in all cells – it is not just sometimes consider ‘additional’ to classical genetics, but it is still fully classical genetics that is important; there are many epigenetic dependent on DNA, RNA and protein sequences, and at least since [1] mechanisms affecting the differences in protein expression. -
Using Classical Genetics Simulator (CGS) to Teach Students the Basics of Genetic Research
Tested Studies for Laboratory Teaching Proceedings of the Association for Biology Laboratory Education Vol. 33, 85–94, 2012 Using Classical Genetics Simulator (CGS) to Teach Students the Basics of Genetic Research Jean Heitz1, Mark Wolansky2, and Ben Adamczyk3 1University of Wisconsin, Zoology Department, 250 N. Mills Street, Madison WI 53706 USA 2University of Alberta, Department of Biological Science, Edmonton AB, CAN T6G 2E9 3Genedata, MA, A-1 Cranberry HI, Lexington MA 02421 USA ([email protected];[email protected];[email protected]) Some experiments are not well suited to large introductory labs because of space, time and funding constraints. Among these are classical Mendelian genetics investigations. Cyber labs can overcome these constraints. Classi- cal Genetics Simulator (CGS) gives students the opportunity to perform controlled crosses with model organisms much like a geneticist would do. CGS provides populations of Drosophila, Arabidopsis or mice with unknown patterns of inheritance and gives students the tools to design and perform experiments to discover these patterns. Students require an understanding of the what, why and how of solving genetic problems to successfully complete our cyber genetics lab activities. Keywords: Mendelian genetics, simulator, classical genetics simulator (CGS), linkage, monohybrid, dihybrid Introduction In large introductory biology classes some labs are dif- How can students get practice doing what geneticists ficult to do because of space, time and funding constraints. actually do? Among these are classical Mendelian genetics investiga- An obvious way for students to get practice doing what tions. As a result, we have turned to cyber labs. Working geneticists do is to conduct actual crosses. -
The Sustained Impact of Model Organisms—In Genetics and Epigenetics
| COMMENTARY The Sustained Impact of Model Organisms—in Genetics and Epigenetics Nancy M. Bonini*,†,1 and Shelley L. Berger‡,1 *Department of Cellular and Developmental Biology, Perlman School of Medicine, †Department of Biology, and ‡Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104 KEYWORDS classical genetics; epigenetics; behavior; genomics; human disease Now that DNA sequencing can reveal disease-relevant muta- The Rise of Model Organism Genetics and the tions in humans, it is appropriate to consider the history of Embrace of Molecular Genetics genetic research on model organisms, and whether model Model organism genetics has revealed many fundamental organisms will continue to play an important role. Our view is biological processes. This was achieved by applying unbiased that genetic research in model organisms has had an in- genetic screens to reveal genes and their mechanisms of disputably enormous impact on our understanding of com- action in processes such as cell cycle control (Nasmyth plex biological pathways and their epigenetic regulation, and 2001), basic body plan specification and mechanisms of de- on the development of tools and approaches that have been velopment (Lewis 1978; Nusslein-Volhard and Wieschaus crucial for studies of human biology. We submit that model 1980), and cell death pathways (Ellis and Horvitz 1986; organisms will remain indispensable for interpreting complex Avery and Horvitz 1987), all of which were recognized with genomic data, for testing hypotheses regarding function, and Nobel prizes. for further study of complex behaviors, to reveal not only The sequencing of genomes revealed the breadth of the novel genetic players, but also the profound impact of epige- conservation of genes, and of entire pathways. -
KDD and Data Mining
StatisticStatistic MethodsMethods inin DataData MiningMining IntroductionIntroduction ProfessorProfessor Dr.Dr. GholamrezaGholamreza NakhaeizadehNakhaeizadeh content IntroductionIntroduction LiteratureLiterature used used WhyWhy Data Data Mining? Mining? ExamplesExamples of of large large databases databases WhatWhat is is Data Data Mining? Mining? InterdisciplinaryInterdisciplinary aspects aspects of of Data Data Mining Mining OtherOther issues issues in in recent recent data data analysis: analysis: Web Web Mining, Mining, Text Text Mining Mining TypicalTypical Data Data Mining Mining Systems Systems ExamplesExamples of of Data Data Mining Mining Tools Tools ComparisonComparison of of Data Data Mining Mining Tools Tools HistoryHistory of of Data Data Mining, Mining, Data Data Mining: Mining: Data Data Mining Mining rapid rapid development development SomeSome European European funded funded projects projects ScientificScientific Networking Networking and and partnership partnership ConferencesConferences and and Journals Journals on on Data Data Mining Mining FurtherFurther References References 2 Literatur used (1) Principles of Data Mining Pang-Ning Tan, Jiawei Han and David J. Hand, Heikki Mannila, Michael Steinbach, Micheline Kamber Padhraic Smyth Vipin Kumar 3 Literature Used (2) http://cse.stanford.edu/class/sophomore-college/projects-00/neural-networks/ http://www.cs.cmu.edu/~awm/tutorials http://www.crisp-dm.org/CRISPwP-0800.pdf http://en.wikipedia.org/wiki/Feedforward_neural_network http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html#Feedback%20networks -
G14TBS Part II: Population Genetics
G14TBS Part II: Population Genetics Dr Richard Wilkinson Room C26, Mathematical Sciences Building Please email corrections and suggestions to [email protected] Spring 2015 2 Preliminaries These notes form part of the lecture notes for the mod- ule G14TBS. This section is on Population Genetics, and contains 14 lectures worth of material. During this section will be doing some problem-based learning, where the em- phasis is on you to work with your classmates to generate your own notes. I will be on hand at all times to answer questions and guide the discussions. Reading List: There are several good introductory books on population genetics, although I found their level of math- ematical sophistication either too low or too high for the purposes of this module. The following are the sources I used to put together these lecture notes: • Gillespie, J. H., Population genetics: a concise guide. John Hopkins University Press, Baltimore and Lon- don, 2004. • Hartl, D. L., A primer of population genetics, 2nd edition. Sinauer Associates, Inc., Publishers, 1988. • Ewens, W. J., Mathematical population genetics: (I) Theoretical Introduction. Springer, 2000. • Holsinger, K. E., Lecture notes in population genet- ics. University of Connecticut, 2001-2010. Available online at http://darwin.eeb.uconn.edu/eeb348/lecturenotes/notes.html. • Tavar´eS., Ancestral inference in population genetics. In: Lectures on Probability Theory and Statistics. Ecole d'Et´esde Probabilit´ede Saint-Flour XXXI { 2001. (Ed. Picard J.) Lecture Notes in Mathematics, Springer Verlag, 1837, 1-188, 2004. Available online at http://www.cmb.usc.edu/people/stavare/STpapers-pdf/T04.pdf Gillespie, Hartl and Holsinger are written for non-mathematicians and are easiest to follow. -
Mendelian Gene ” and the ” Molecular Gene ” : Two Relevant Concepts of Genetic Units V Orgogozo, Alexandre E
The ” Mendelian Gene ” and the ” Molecular Gene ” : Two Relevant Concepts of Genetic Units V Orgogozo, Alexandre E. Peluffo, Baptiste Morizot To cite this version: V Orgogozo, Alexandre E. Peluffo, Baptiste Morizot. The ” Mendelian Gene ” and the ”Molec- ular Gene ” : Two Relevant Concepts of Genetic Units. Virginie Orgogozo. Genes and Evolu- tion, 119, Elsevier, pp.1-26, 2016, Current Topics in Developmental Biology, 978-0-12-417194-7. 10.1016/bs.ctdb.2016.03.002. hal-01354346 HAL Id: hal-01354346 https://hal.archives-ouvertes.fr/hal-01354346 Submitted on 19 Aug 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ARTICLE IN PRESS The “Mendelian Gene” and the “Molecular Gene”: Two Relevant Concepts of Genetic Units V. Orgogozo*,1, A.E. Peluffo*, B. Morizot† *Institut Jacques Monod, UMR 7592, CNRS-Universite´ Paris Diderot, Sorbonne Paris Cite´, Paris, France †Universite´ Aix-Marseille, CNRS UMR 7304, Aix-en-Provence, France 1Corresponding author: e-mail address: [email protected] Contents 1. Introduction 2 2. The “Mendelian Gene” and the “Molecular Gene” 5 3. Current Literature Often Confuses the “Mendelian Gene” and the “Molecular Gene” Concepts 7 4. -
Molecular Population Genetics
| FLYBOOK ECOLOGY AND EVOLUTION Molecular Population Genetics Sònia Casillas*,† and Antonio Barbadilla*,†,1 *Institut de Biotecnologia i de Biomedicina, and yDepartament de Genètica i de Microbiologia, Campus Universitat Autònoma de Barcelona (UAB), 08193 Cerdanyola del Vallès, Barcelona, Spain ORCID IDs: 0000-0001-8191-0062 (S.C.); 0000-0002-0374-1475 (A.B.) ABSTRACT Molecular population genetics aims to explain genetic variation and molecular evolution from population genetics principles. The field was born 50 years ago with the first measures of genetic variation in allozyme loci, continued with the nucleotide sequencing era, and is currently in the era of population genomics. During this period, molecular population genetics has been revolutionized by progress in data acquisition and theoretical developments. The conceptual elegance of the neutral theory of molecular evolution or the footprint carved by natural selection on the patterns of genetic variation are two examples of the vast number of inspiring findings of population genetics research. Since the inception of the field, Drosophila has been the prominent model species: molecular variation in populations was first described in Drosophila and most of the population genetics hypotheses were tested in Drosophila species. In this review, we describe the main concepts, methods, and landmarks of molecular population genetics, using the Drosophila model as a reference. We describe the different genetic data sets made available by advances in molecular technologies, and the theoretical developments fostered by these data. Finally, we review the results and new insights provided by the population genomics approach, and conclude by enumerating challenges and new lines of inquiry posed by in- creasingly large population scale sequence data.