Scientific Data Mining in Astronomy
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Models of the Gene Must Inform Data-Mining Strategies in Genomics
entropy Review Models of the Gene Must Inform Data-Mining Strategies in Genomics Łukasz Huminiecki Department of Molecular Biology, Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, 00-901 Warsaw, Poland; [email protected] Received: 13 July 2020; Accepted: 22 August 2020; Published: 27 August 2020 Abstract: The gene is a fundamental concept of genetics, which emerged with the Mendelian paradigm of heredity at the beginning of the 20th century. However, the concept has since diversified. Somewhat different narratives and models of the gene developed in several sub-disciplines of genetics, that is in classical genetics, population genetics, molecular genetics, genomics, and, recently, also, in systems genetics. Here, I ask how the diversity of the concept impacts data-integration and data-mining strategies for bioinformatics, genomics, statistical genetics, and data science. I also consider theoretical background of the concept of the gene in the ideas of empiricism and experimentalism, as well as reductionist and anti-reductionist narratives on the concept. Finally, a few strategies of analysis from published examples of data-mining projects are discussed. Moreover, the examples are re-interpreted in the light of the theoretical material. I argue that the choice of an optimal level of abstraction for the gene is vital for a successful genome analysis. Keywords: gene concept; scientific method; experimentalism; reductionism; anti-reductionism; data-mining 1. Introduction The gene is one of the most fundamental concepts in genetics (It is as important to biology, as the atom is to physics or the molecule to chemistry.). The concept was born with the Mendelian paradigm of heredity, and fundamentally influenced genetics over 150 years [1]. -
A Review of Data Mining Techniques in Medicine
JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT, Vol. 14.1, May 2020 A REVIEW OF DATA MINING TECHNIQUES IN MEDICINE Ionela-Cătălina ZAMFIR1 Ana-Maria Mihaela IORDACHE2 Abstract: Data mining techniques found applications in many areas since their development. New techniques or combination of techniques are created continuously. Techniques like supervised or unsupervised learning are used for prediction different diseases and have the aim to identify (diagnosis) the disease and predict the incidence or make predictions about treatment and survival rate. This paper presents the new trends in applying data mining in healthcare area, taking into account the most relevant papers in this respect. By analyzing both applied and review articles, using text mining on keywords and abstracts, it is revealing the most used methodologies (Support Vector Machines, Artificial Neural Networks, K-Means Algorithm, Decision Trees, Logistic Regression) as well as the areas of interest (prediction of different diseases, like: breast cancer, lung cancer, heart diseases, diabetes, thyroid or kidney diseases). Keywords: data mining, predictions, literature review, disease, healthcare, medicine JEL classification: I15, C38 1. Introduction Data Mining techniques know a wide spread and applicability nowadays. Among the areas that use these techniques, the medical and economic fields have the most interesting applications. In economic area, the most studied issues are: the prediction of bankruptcy risk for companies, the prediction of fraud in insurance, fraud in financial statements, fraud in credit card transactions, or the customers’ classification for targeted marketing campaign. In medical field, some of the most approached issues are: the identification of cancer (breast, lung, or other types), hearth diseases, diabetes, skin disease or the prediction of different diseases incidence. -
A Proposed Data Mining Methodology and Its Application to Industrial Engineering
University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Masters Theses Graduate School 8-2002 A Proposed Data Mining Methodology and its Application to Industrial Engineering Jose Solarte University of Tennessee - Knoxville Follow this and additional works at: https://trace.tennessee.edu/utk_gradthes Part of the Other Engineering Commons Recommended Citation Solarte, Jose, "A Proposed Data Mining Methodology and its Application to Industrial Engineering. " Master's Thesis, University of Tennessee, 2002. https://trace.tennessee.edu/utk_gradthes/2172 This Thesis is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Masters Theses by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a thesis written by Jose Solarte entitled "A Proposed Data Mining Methodology and its Application to Industrial Engineering." I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Science, with a major in Industrial Engineering. Denise F. Jackson, Major Professor We have read this thesis and recommend its acceptance: Robert E. Ford, Tyler A. Kress Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) To the Graduate Council: I am submitting herewith a thesis written by Jose Solarte entitled “A Proposed Data Mining Methodology and its Application to Industrial Engineering.” I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Science, with a major in Industrial Engineering. -
Data Quality Through Data Integration: How Integrating Your IDEA Data Will Help Improve Data Quality
center for the integration of IDEA Data Data Quality Through Data Integration: How Integrating Your IDEA Data Will Help Improve Data Quality July 2018 Authors: Sara Sinani & Fred Edora High-quality data is essential when looking at student-level data, including data specifically focused on students with disabilities. For state education agencies (SEAs), it is critical to have a solid foundation for how data is collected and stored to achieve high-quality data. The process of integrating the Individuals with Disabilities Education Act (IDEA) data into a statewide longitudinal data system (SLDS) or other general education data system not only provides SEAs with more complete data, but also helps SEAs improve accuracy of federal reporting, increase the quality of and access to data within and across data systems, and make better informed policy decisions related to students with disabilities. Through the data integration process, including mapping data elements, reviewing data governance processes, and documenting business rules, SEAs will have developed documented processes and policies that result in more integral data that can be used with more confidence. In this brief, the Center for the Integration of IDEA Data (CIID) provides scenarios based on the continuum of data integration by focusing on three specific scenarios along the integration continuum to illustrate how a robust integrated data system improves the quality of data. Scenario A: Siloed Data Systems Data systems where the data is housed in separate systems and State Education Agency databases are often referred to as “data silos.”1 These data silos are isolated from other data systems and databases. -
Managing Data in Motion This Page Intentionally Left Blank Managing Data in Motion Data Integration Best Practice Techniques and Technologies
Managing Data in Motion This page intentionally left blank Managing Data in Motion Data Integration Best Practice Techniques and Technologies April Reeve AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier Acquiring Editor: Andrea Dierna Development Editor: Heather Scherer Project Manager: Mohanambal Natarajan Designer: Russell Purdy Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA Copyright r 2013 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. -
Informatica Cloud Data Integration
Informatica® Cloud Data Integration Microsoft SQL Server Connector Guide Informatica Cloud Data Integration Microsoft SQL Server Connector Guide March 2019 © Copyright Informatica LLC 2017, 2019 This software and documentation are provided only under a separate license agreement containing restrictions on use and disclosure. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC. U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation is subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License. Informatica, the Informatica logo, Informatica Cloud, and PowerCenter are trademarks or registered trademarks of Informatica LLC in the United States and many jurisdictions throughout the world. A current list of Informatica trademarks is available on the web at https://www.informatica.com/trademarks.html. Other company and product names may be trade names or trademarks of their respective owners. Portions of this software and/or documentation are subject to copyright held by third parties. Required third party notices are included with the product. See patents at https://www.informatica.com/legal/patents.html. DISCLAIMER: Informatica LLC provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of noninfringement, merchantability, or use for a particular purpose. -
Discovering Knowledge in Data: an Introduction to Data Mining, Introduces the Reader to This Rapidly Growing field of Data Mining
WY045-FM September 24, 2004 10:2 DISCOVERING KNOWLEDGE IN DATA An Introduction to Data Mining DANIEL T. LAROSE Director of Data Mining Central Connecticut State University A JOHN WILEY & SONS, INC., PUBLICATION iii WY045-FM September 24, 2004 10:2 vi WY045-FM September 24, 2004 10:2 DISCOVERING KNOWLEDGE IN DATA i WY045-FM September 24, 2004 10:2 ii WY045-FM September 24, 2004 10:2 DISCOVERING KNOWLEDGE IN DATA An Introduction to Data Mining DANIEL T. LAROSE Director of Data Mining Central Connecticut State University A JOHN WILEY & SONS, INC., PUBLICATION iii WY045-FM September 24, 2004 10:2 Copyright ©2005 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. -
What to Look for When Selecting a Master Data Management Solution What to Look for When Selecting a Master Data Management Solution
What to Look for When Selecting a Master Data Management Solution What to Look for When Selecting a Master Data Management Solution Table of Contents Business Drivers of MDM .......................................................................................................... 3 Next-Generation MDM .............................................................................................................. 5 Keys to a Successful MDM Deployment .............................................................................. 6 Choosing the Right Solution for MDM ................................................................................ 7 Talend’s MDM Solution .............................................................................................................. 7 Conclusion ..................................................................................................................................... 8 Master data management (MDM) exists to enable the business success of an orga- nization. With the right MDM solution, organizations can successfully manage and leverage their most valuable shared information assets. Because of the strategic importance of MDM, organizations can’t afford to make any mistakes when choosing their MDM solution. This article will discuss what to look for in an MDM solution and why the right choice will help to drive strategic business initiatives, from cloud computing to big data to next-generation business agility. 2 It is critical for CIOs and business decision-makers to understand the value -
Data Mining Techniques in Diagnosis of Chronic Diseases
Journal of Applied Technology and Innovation (eISSN: 2600-7304) vol. 1, no. 2, (2017), pp. 10-27 Data Mining Techniques in Diagnosis of Chronic Diseases Keerthana Rajendran Faculty of Computing, Engineering & Technology Asia Pacific University of Technology & Innovation 57000 Kuala Lumpur, Malaysia Email: [email protected] Abstract - Chronic diseases and cancer are raising health concerns globally due to lower chances of survival when encountered with any of these diseases. The need to implement automated data mining techniques to enable cost-effective and early diagnosis of various diseases is fast becoming a trend in healthcare industry. The optimal techniques for prediction and diagnosis vary between different chronic diseases and the disease related-parameters under study. This review article provides a holistic view of the types of machine learning techniques that can be used in diagnosis and prediction of several chronic diseases such as diabetes, cardiovascular and brain diseases, chronic kidney disease and a few types of cancers, namely breast, lung and brain cancers. Overall, the computer-aided, automatic data mining techniques that are commonly employed in diagnosis and prognosis of chronic diseases include decision tree algorithms, Naïve Bayes, association rule, multilayer perceptron (MLP), Random Forest and support vector machines (SVM), among others. As the accuracy and overall performance of the classifiers differ for every disease, this article provides a mean to understand the ideal machine learning techniques for prediction of several well-known chronic diseases. Index Terms - Data mining, Healthcare systems, Machine Learning, Big data analytics Introduction he evolution of healthcare industry from the traditional healthcare system to the T utilization of Electronic Health Records (EHR) system has introduced the concept of big data in the healthcare sector. -
Astroinformatics: Image Processing and Analysis of Digitized Astronomical Data with Web-Based Implementation
ASTROINFORMATICS: IMAGE PROCESSING AND ANALYSIS OF DIGITIZED ASTRONOMICAL DATA WITH WEB-BASED IMPLEMENTATION N. Kirov(1;2), O. Kounchev(2), M. Tsvetkov(3) (1)Computer Science Department, New Bulgarian University (2)Institute of Mathematics and Informatics, Bulgarian Academy of Sciences (3)Institute of Astronomy, Bulgarian Academy of Sciences [email protected], [email protected], [email protected] ABSTRACT The newly born area of Astroinformatics has arisen as an interdisciplinary domain from Astronomy and Information and Communication Technologies (ICT). Recently, four institutes of the Bulgarian Academy of Sciences launched a joint project called “Astroinformatics”. The main goal of the project is the development of methods and techniques for preservation and exploitation of the scientific, cultural and historic heritage of the astronomical observations. The Wide-Field Plate Data Base is an ICT project of the Institute of Astronomy, which has been launched in 1991, supported by International Astronomic Union [1]. Now over two million photographic plates, obtained with professional telescopes at 125 observatories worldwide, are identified and collected in this database. Digitization of photographic plates is an actual task for astronomical society. So far 150 000 plates have been digitized through several European research programs. As a result, a huge amount of data is collected (with about 2TB in size). The access, manipulation and data mining of this data is a serious challenge for the ICT community. In this article we discuss goals, tasks and achieve- ments in the area of Astroinformatics. 1 THE WIDE-FIELD PLATE DATA BASE The basic information for all wide-field (> 1o) photographic astronomical plate archives and for their content, stored in many observatories is collected in the Wide-Field Plate Database (WFPDB, skyarchive.org), established in 1991 and developed in the Institute of Astronomy, Bulgar- ian Academy of Sciences. -
Using ETL, EAI, and EII Tools to Create an Integrated Enterprise
Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise Colin White Founder, BI Research TDWI Webcast October 2005 TDWI Data Integration Study Copyright © BI Research 2005 2 Data Integration: Barrier to Application Development Copyright © BI Research 2005 3 Top Three Data Integration Inhibitors Copyright © BI Research 2005 4 Staffing and Budget for Data Integration Copyright © BI Research 2005 5 Data Integration: A Definition A framework of applications, products, techniques and technologies for providing a unified and consistent view of enterprise-wide business data Copyright © BI Research 2005 6 Enterprise Business Data Copyright © BI Research 2005 7 Data Integration Architecture Source Target Data integration Master data applications Business domain dispersed management (MDM) MDM applications integrated internal data & external Data integration techniques data Data Data Data propagation consolidation federation Changed data Data transformation (restructure, capture (CDC) cleanse, reconcile, aggregate) Data integration technologies Enterprise data Extract transformation Enterprise content replication (EDR) load (ETL) management (ECM) Enterprise application Right-time ETL Enterprise information integration (EAI) (RT-ETL) integration (EII) Web services (services-oriented architecture, SOA) Data integration management Data quality Metadata Systems management management management Copyright © BI Research 2005 8 Data Integration Techniques and Technologies Data Consolidation centralized data Extract, transformation -
Astro2020 State of the Profession Consideration White Paper
Astro2020 State of the Profession Consideration White Paper Realizing the potential of astrostatistics and astroinformatics September 27, 2019 Principal Author: Name: Gwendolyn Eadie4;5;6;15;17 Email: [email protected] Co-authors: Thomas Loredo1;19, Ashish A. Mahabal2;15;16;18, Aneta Siemiginowska3;15, Eric Feigelson7;15, Eric B. Ford7;15, S.G. Djorgovski2;20, Matthew Graham2;15;16, Zˇeljko Ivezic´6;16, Kirk Borne8;15, Jessi Cisewski-Kehe9;15;17, J. E. G. Peek10;11, Chad Schafer12;19, Padma A. Yanamandra-Fisher13;15, C. Alex Young14;15 1Cornell University, Cornell Center for Astrophysics and Planetary Science (CCAPS) & Department of Statistical Sciences, Ithaca, NY 14853, USA 2Division of Physics, Mathematics, & Astronomy, California Institute of Technology, Pasadena, CA 91125, USA 3Center for Astrophysics j Harvard & Smithsonian, Cambridge, MA 02138, USA 4eScience Institute, University of Washington, Seattle, WA 98195, USA 5DIRAC Institute, Department of Astronomy, University of Washington, Seattle, WA 98195, USA 6Department of Astronomy, University of Washington, Seattle, WA 98195, USA 7Penn State University, University Park, PA 16802, USA 8Booz Allen Hamilton, Annapolis Junction, MD, USA 9Department of Statistics & Data Science, Yale University, New Haven, CT 06511, USA 10Department of Physics & Astronomy, Johns Hopkins University, Baltimore, MD 21218, USA arXiv:1909.11714v1 [astro-ph.IM] 25 Sep 2019 11Space Telescope Science Institute, Baltimore, MD 21218, USA 12Department of Statistics & Data Science Carnegie Mellon University, Pittsburgh,