Survey of Object-Based Data Reduction Techniques in Observational Astronomy

Total Page:16

File Type:pdf, Size:1020Kb

Survey of Object-Based Data Reduction Techniques in Observational Astronomy Open Phys. 2016; 14:579–587 Research Article Open Access Szymon Łukasik*, André Moitinho, Piotr A. Kowalski, António Falcão, Rita A. Ribeiro, and Piotr Kulczycki Survey of Object-Based Data Reduction Techniques in Observational Astronomy DOI 10.1515/phys-2016-0064 Received Sep 03, 2016; accepted Oct 26, 2016 1 Introduction Abstract: Dealing with astronomical observations repre- Astronomy stands on the forefront of big data analytics. In sents one of the most challenging areas of big data analyt- recent decades it acquired tools which have enabled un- ics. Besides huge variety of data types, dynamics related precedented growth in generated data and consequently to continuous data flow from multiple sources, handling – information which needs to be processed. It led to the enormous volumes of data is essential. This paper pro- creation of two specific fields of scientific research: astro- vides an overview of methods aimed at reducing both the statistics, which applies statistics to the study and analysis number of features/attributes as well as data instances. It of astronomical data, and astroinformatics, which uses in- concentrates on data mining approaches not related to in- formation/communications technologies to solve the big struments and observation tools instead working on pro- data problems faced in astronomy [58]. cessed object-based data. The main goal of this article is Since the times of individual observations with ba- to describe existing datasets on which algorithms are fre- sic optical instruments astronomy transformed into a do- quently tested, to characterize and classify available data main employing more than 1900 observatories (Interna- reduction algorithms and identify promising solutions ca- tional Astronomical Union code list currently holds 1984 pable of addressing present and future challenges in as- records [23]). The sizes of catalogs of astronomical objects tronomy. have reached petabytes, and they may contain billions of instances described by hundreds of parameters [14]. As Keywords: astronomy; big data; dimensionality reduction; such, the obstacles of astronomical data analysis exem- feature extraction; data condensation plify perfectly three main challenges of Big Data, namely PACS: 93.85.Bc volume, velocity and variety (also known as the 3Vs). Vol- ume corresponds to both large number of instances and characteristics (features), velocity is related to dynamics of the data flow, and finally, variety stands for the broad range of data types and data sources [17]. This paper summarizes research efforts in the first of these aforementioned domains. Its goal is to present tech- niques aimed at alleviating problems of data dimension- *Corresponding Author: Szymon Łukasik: Faculty of Physics and ality and its numerosity from a data mining perspective as Applied Computer Science, AGH University of Science and Tech- nology; Systems Research Institute, Polish Academy of Sciences, well as to suggest suitable algorithms for upcoming chal- E-mail: [email protected] lenges. Data is seen here as a set of astronomical objects André Moitinho: CENTRA, Universidade de Lisboa, FCUL, Portu- and their properties (or their spectra). It means it is already gal; Email: [email protected] processed from raw signals/images typically present at the Piotr A. Kowalski: Faculty of Physics and Applied Computer Sci- instrument’s level. Similarly the term "reduction" corre- ence, AGH University of Science and Technology; Systems Research sponds here purely to the transformation of object-based Institute, Polish Academy of Sciences; Email: [email protected] António Falcão: Center of Technology and Systems, UNINOVA, data not to the transition of raw signals/images to science Portugal; Email: [email protected] ready data products. The latter can be composed of several Rita A. Ribeiro: Center of Technology and Systems, UNINOVA, steps and in this context data reduction could refer to sev- Portugal; Email: [email protected] eral things: that raw images were processed, that photo- Piotr Kulczycki: Faculty of Physics and Applied Computer Science, metric measurements were performed using counts stored AGH University of Science and Technology; Systems Research Insti- tute, Polish Academy of Sciences; Email: [email protected] © 2016 S. Łukasik et al., published by De Gruyter Open. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License. 580 Ë S. Łukasik et al. Table 1: Selected sky surveys Survey Institution Number of objects Type Time frame Hipparcos European Space Agency 0.12M Optical 1989-1993 Tycho-2 European Space Agency 2.5M Optical 1989-1993 DPOSS Caltech 550M Optical 1950-1990 2MASS Univ. of Massachusetts, Caltech 300M Near-IR 1997-2001 Gaia European Space Agency 1000M Optical 2013- SDSS Astrophysical Research 470M Optical 2000- Consortium LSST LSST Corporation 4000M Optical 2019- in the pixels, that physical properties were extracted from monitoring with the naked eye allowed significant devel- spectra, etc. opments to astronomical science. Today both wide-field In the first part of the paper, following this introduc- surveys (large data sets obtained over areas of the sky that tion, the scale of the data analysis problems of contem- may be at least of the order of 1% of the entire Galaxy, porary observational astronomy is emphasized. The sec- e.g. see Gaia in Table 1) and deep surveys (aimed at get- ond section reports on available datasets and knowledge ting important informative content from only small areas discovery procedures. In the third section an overview of of the galaxy but with significant depth) represent keys to feature extraction/dimensionality reduction techniques is groundbreaking discoveries about the Universe. provided along with examples of their application for as- Selected recent surveys frequently approached with tronomical data. The fourth section is devoted to data nu- the use of data science tools are listed in Table 1. For a merosity reduction and its specific utilization for visualiza- more exhaustive list of astronomical surveys one can refer tion of astronomical data. Both sampling and more sophis- to [9]. It can be noticed that the number of objects listed – ticated approaches are also addressed. Finally we suggest even for older projects – is huge. The dimensionality of the some existing algorithmic solutions for astronomical data datasets depends on appropriate data preprocessing (e.g. reduction problems, identify future challenges in this do- frequency binning) but may reach thousands of attributes. main, and provide concluding remarks. The extraction of knowledge from such enormous data sets is a highly complex task. Difficulties which may occur are mainly related to limits in the processing performance 2 Data Volume Problem in of computer systems – for large-sized samples – and prob- lems exclusively connected with the analysis of multidi- Observational Astronomy mensional data. The latter arises mostly from a number of phenomena occurring in data sets of this type, known in The development of novel instruments used for produc- literature as "the curse of multidimensionality". Above all, ing astronomical data increases the data volume gener- this includes the exponential growth in sample size, nec- ated each year, at a rate which is twice that of Moore’s essary to achieve appropriate effectiveness of data analysis law [46]. That is why the essence of contemporary obser- methods with increasing dimension, as well as the vanish- vational astronomy could be accurately described with the ing difference between near and far points (norm concen- metaphor of drinking water from a fire hose [49]. It re- tration) using standard distance metrics [30]. flects the fact that data processing algorithms have to deal Survey data can be explored with a variety of data with enormous amount of data – also on a real-time ba- science techniques. First of all, outlier detection which is sis [58]. Consequently data reduction occurs at a low-level, aimed at identifying elements which are atypical for the at signal/image processing phase to bring down the size of whole dataset. In astronomy this technique is generally transferred data. It typically involves removing noise, sig- useful for discovering unusual, rare or unknown types natures of the atmosphere and/or instrument and other of astronomical objects or phenomena but also for data data contaminating factors. For examples of this type of preprocessing [59]. Another procedure is cluster analysis, reduction one could refer to [15, 16, 44, 50]. which is the division of available data elements into sub- Sky surveys represent the fundamental core of astron- groups (clusters) where the elements belonging to each omy. Historically, making sky observations, plotting and cluster are similar to each other and, on the other hand, Survey of Data Reduction Techniques in Observational Astronomy Ë 581 Table 2: Selected methods of dimensionality reduction used for astronomical data Method Linear Parameters References Principal Component Analysis Yes – [27] Kernel Principal Component Analysis No 1 [45] Isomap No 1 [48] Locally Linear Embedding No 1 [43] Diffusion Maps No 2 [31] Locality Preserving Projection Yes 1 [20] Laplacian Eigenmaps No 2 [3] there exist a significant dissimilarity between different selection) or by means of constructing a reduced data set, cluster elements [33]. Identifying galaxies or groups of ob- based on initial features (feature extraction) [24, 57]. The jects/galaxies are clustering tasks frequently performed
Recommended publications
  • Applications of Digital Image Processing in Real Time World
    INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 12, DECEMBER 2019 ISSN 2277-8616 Applications Of Digital Image Processing In Real Time World B.Sridhar Abstract :-- Digital contents are the essential kind of analyzing, information perceived and which are explained by the human brain. In our brain, one third of the cortical area is focused only to visual information processing. Digital image processing permits the expandable values of different algorithms to be given to the input section and prevent the problems of noise and distortion during the image processing. Hence it deserves more advantages than analog based image processing. Index Terms:-- Agriculture, Biomedical imaging, Face recognition, image enhancement, Multimedia Security, Authentication —————————— —————————— 2 REAL TIME APPLICATIONS OF IMAGE PROCESSING 1 INTRODUCTION This chapter reviews the recent advances in image Digital image processing is dependably a catching the processing techniques for various applications, which more attention field and it freely transfer the upgraded include agriculture, multimedia security, Remote sensing, multimedia data for human understanding and analyzing Computer vision, Medical applications, Biometric of image information for capacity, transmission, and verification, etc,. representation for machine perception[1]. Generally, the stages of investigation of digital image can be followed 2.1 Agriculture and the workflow statement of the digital image In the present situation, due to the huge density of population, gives the horrible results of demand of food, processing (DIP) is displayed in Figure 1. diminishments in agricultural land, environmental variation and the political instability, the agriculture industries are trying to find the new solution for enhancing the essence of the productivity and sustainability.―In order to support and satifisfied the needs of the farmers Precision agriculture is employed [2].
    [Show full text]
  • Health Informatics Principles
    Health Informatics Principles Foundational Curriculum: Cluster 4: Informatics Module 7: The Informatics Process and Principles of Health Informatics Unit 2: Health Informatics Principles FC-C4M7U2 Curriculum Developers: Angelique Blake, Rachelle Blake, Pauliina Hulkkonen, Sonja Huotari, Milla Jauhiainen, Johanna Tolonen, and Alpo Vӓrri This work is produced by the EU*US eHealth Work Project. This project has received funding from the European Union’s Horizon 2020 research and 21/60 innovation programme under Grant Agreement No. 727552 1 EUUSEHEALTHWORK Unit Objectives • Describe the evolution of informatics • Explain the benefits and challenges of informatics • Differentiate between information technology and informatics • Identify the three dimensions of health informatics • State the main principles of health informatics in each dimension This work is produced by the EU*US eHealth Work Project. This project has received funding from the European Union’s Horizon 2020 research and FC-C4M7U2 innovation programme under Grant Agreement No. 727552 2 EUUSEHEALTHWORK The Evolution of Health Informatics (1940s-1950s) • In 1940, the first modern computer was built called the ENIAC. It was 24.5 metric tonnes (27 tons) in volume and took up 63 m2 (680 sq. ft.) of space • In 1950 health informatics began to take off with the rise of computers and microchips. The earliest use was in dental projects during late 50s in the US. • Worldwide use of computer technology in healthcare began in the early 1950s with the rise of mainframe computers This work is produced by the EU*US eHealth Work Project. This project has received funding from the European Union’s Horizon 2020 research and FC-C4M7U2 innovation programme under Grant Agreement No.
    [Show full text]
  • Draft Common Framework for Earth-Observation Data
    THE U.S. GROUP ON EARTH OBSERVATIONS DRAFT COMMON FRAMEWORK FOR EARTH-OBSERVATION DATA Table of Contents Background ..................................................................................................................................... 2 Purpose of the Common Framework ........................................................................................... 2 Target User of the Common Framework ................................................................................. 4 Scope of the Common Framework........................................................................................... 5 Structure of the Common Framework ...................................................................................... 6 Data Search and Discovery Services .............................................................................................. 7 Introduction ................................................................................................................................. 7 Standards and Protocols............................................................................................................... 8 Methods and Practices ............................................................................................................... 10 Implementations ........................................................................................................................ 11 Software ................................................................................................................................
    [Show full text]
  • Information- Processing Conceptualizations of Human Cognition: Past, Present, and Future
    13 Information- Processing Conceptualizations of Human Cognition: Past, Present, and Future Elizabeth E Loftus and Jonathan w: Schooler Historically, scholars have used contemporary machines as models of the mind. Today, much of what we know about human cognition is guided by a three-stage computer model. Sensory memory is attributed with the qualities of a computer buffer store in which individual inputs are briefly maintained until a meaningful entry is recognized. Short-term memory is equivalent to the working memory of a computer, being highly flexible yet having only a limited capacity. Finally, long-term memory resembles the auxiliary store of a computer, with a virtually unlimited capacity for a variety of information. Although the computer analogy provides a useful framework for describing much of the present research on human cogni- tion, it is insufficient in several respects. For example, it does not ade- quately represent the distinction between conscious and non-conscious thought. As an alternative. a corporate metaphor of the mind is suggested as a possible vehiclefor guiding future research. Depicting the mind as a corporation accommodates many aspects of cognition including con- sciousness. In addition, by offering a more familiar framework, a corpo- rate model is easily applied to subjective psychological experience as well as other real world phenomena. Currently, the most influential approach in cognitive psychology is based on analogies derived from the digital computer. The information processing approach has been an important source of models and ideas but the fate of its predecessors should serve to keep us humble concerning its eventual success. In 30 years, the computer-based information processing approach that cur- rently reigns may seem as invalid to the humlln mind liSthe wax-tablet or tl'kphOlIl' switchhoard mmkls do todoy.
    [Show full text]
  • Distributed Cognition: Understanding Complex Sociotechnical Informatics
    Applied Interdisciplinary Theory in Health Informatics 75 P. Scott et al. (Eds.) © 2019 The authors and IOS Press. This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0). doi:10.3233/SHTI190113 Distributed Cognition: Understanding Complex Sociotechnical Informatics Dominic FURNISSa,1, Sara GARFIELD b, c, Fran HUSSON b, Ann BLANDFORD a and Bryony Dean FRANKLIN b, c a University College London, Gower Street, London; UK b Imperial College Healthcare NHS Trust, London; UK c UCL School of Pharmacy, London; UK Abstract. Distributed cognition theory posits that our cognitive tasks are so tightly coupled to the environment that cognition extends into the environment, beyond the skin and the skull. It uses cognitive concepts to describe information processing across external representations, social networks and across different periods of time. Distributed cognition lends itself to exploring how people interact with technology in the workplace, issues to do with communication and coordination, how people’s thinking extends into the environment and sociotechnical system architecture and performance more broadly. We provide an overview of early work that established distributed cognition theory, describe more recent work that facilitates its application, and outline how this theory has been used in health informatics. We present two use cases to show how distributed cognition can be used at the formative and summative stages of a project life cycle. In both cases, key determinants that influence performance of the sociotechnical system and/or the technology are identified. We argue that distributed cognition theory can have descriptive, rhetorical, inferential and application power.
    [Show full text]
  • Augmented Cognition for Bioinformatics Problem Solving
    Augmented Cognition for Bioinformatics Problem Solving Olga Anna Kuchar and Jorge Reyes-Spindola Michel Benaroch Pacific Northwest National Laboratory Center for Creation and Management of Digital P.O. Box 999, 902 Battelle Blvd. Ventures Richland, WA, 99352 Martin J. Whitman School of Management Syracuse {olga.kuchar, jorge.reyes.spindola}@pnl.gov University, Syracuse, NY, 13244 [email protected] Abstract We describe a new computational cognitive model that has been developed for solving complex problems in bioinformatics. This model addresses bottlenecks in the information processing stages inherent in the bioinformatics domain due to the complex nature of both the volume and type of data and the knowledge required in solving these problems. There is a restriction on the amount of mental tasks a bioinformatician can handle when solving biology problems. Bioinformaticians are overwhelmed at the amount of fluctuating knowledge, data, and tools available in solving problems, but to create an intelligent system to aid in this problem-solving task is difficult because of the constant flux of data and knowledge; thus, bioinformatics poses challenges to intelligent systems and a new model needs to be created to handle such problem-solving issues. To create a problem-solving system for such an environment, one needs to consider the scientists and their environment, in order to determine how humans can function in such conditions. This paper describes our experiences in developing a complex cognitive system to aid biologists in knowledge discovery. We describe the problem domain, evolution of our cognitive model, the model itself, how this model relates to current literature, and summarize our ongoing research efforts.
    [Show full text]
  • (IMIA) on Education in Health and Medical Informatics
    This document has been published in Methods of Information in Medicine 39(2000), 267-277. Updated version October 2000, with corrections in footnote of section 4.2 Recommendations of the International Medical Informatics Association (IMIA) on Education in Health and Medical Informatics International Medical Informatics Association, Working Group 1: Health and Medical Informatics Education Abstract: The International Medical Informatics Association (IMIA) agreed on international recommendations in health informatics / medical informatics education. These should help to establish courses, course tracks or even complete programs in this field, to further develop existing educational activities in the various nations and to support international initiatives concerning education in health and medical informatics (HMI), particularly international activities in educating HMI specialists and the sharing of courseware. The IMIA recommendations centre on educational needs for health care professionals to acquire knowledge and skills in information processing and information and communication technology. The educational needs are described as a three-dimensional framework. The dimensions are: 1) professionals in health care (physicians, nurses, HMI professionals, ...), 2) type of specialisation in health and medical informatics (IT users, HMI specialists) and 3) stage of career progression (bachelor, master, ...). Learning outcomes are defined in terms of knowledge and practical skills for health care professionals in their role (a) as IT user and (b) as HMI specialist. Recommendations are given for courses/course tracks in HMI as part of educational programs in medicine, nursing, health care management, dentistry, pharmacy, public health, health record administration, and informatics/computer science as well as for dedicated programs in HMI (with bachelor, master or doctor degree).
    [Show full text]
  • Is It Time for Cognitive Bioinformatics?
    g in Geno nin m i ic M s ta & a P Lisitsa et al., J Data Mining Genomics Proteomics 2015, 6:2 D r f o Journal of o t e l DOI: 10.4172/2153-0602.1000173 o a m n r i c u s o J ISSN: 2153-0602 Data Mining in Genomics & Proteomics Review Article Open Access Is it Time for Cognitive Bioinformatics? Andrey Lisitsa1,2*, Elizabeth Stewart2,3, Eugene Kolker2-5 1The Russian Human Proteome Organization (RHUPO), Institute of Biomedical Chemistry, Moscow, Russian Federation 2Data Enabled Life Sciences Alliance (DELSA Global), Moscow, Russian Federation 3Bioinformatics and High-Throughput Data Analysis Laboratory, Seattle Children’s Research Institute, Seattle, WA, USA 4Predictive Analytics, Seattle Children’s Hospital, Seattle, WA, USA 5Departments of Biomedical Informatics & Medical Education and Pediatrics, University of Washington, Seattle, WA, USA Abstract The concept of cognitive bioinformatics has been proposed for structuring of knowledge in the field of molecular biology. While cognitive science is considered as “thinking about the process of thinking”, cognitive bioinformatics strives to capture the process of thought and analysis as applied to the challenging intersection of diverse fields such as biology, informatics, and computer science collectively known as bioinformatics. Ten years ago cognitive bioinformatics was introduced as a model of the analysis performed by scientists working with molecular biology and biomedical web resources. At present, the concept of cognitive bioinformatics can be examined in the context of the opportunities represented by the information “data deluge” of life sciences technologies. The unbalanced nature of accumulating information along with some challenges poses currently intractable problems for researchers.
    [Show full text]
  • Information Processing & Technology Senior Syllabus
    Information Processing & Technology Senior Syllabus 2004 To be studied with Year 11 students for the first time in 2005 Information Processing & Technology Senior Syllabus © The State of Queensland (Queensland Studies Authority) 2004 Copyright protects this work. Please read the Copyright notice at the end of this work. Queensland Studies Authority, PO Box 307, Spring Hill, Queensland 4004, Australia Phone: (07) 3864 0299 Fax: (07) 3221 2553 Email: [email protected] Website: www.qsa.qld.edu.au job 1545 CONTENTS 1 RATIONALE ..................................................................................................................... 1 2 GLOBAL AIMS................................................................................................................. 2 3 GENERAL OBJECTIVES................................................................................................. 3 Knowledge.................................................................................................................3 Research & development ...........................................................................................3 Attitudes and values...................................................................................................3 4 LANGUAGE EDUCATION STATEMENT..................................................................... 4 5 COURSE ORGANISATION ............................................................................................. 5 5.1 Time allocation........................................................................................................
    [Show full text]
  • Computation Vs. Information Processing: How They Are Different and Why It Matters
    Studies in History and Philosophy of Science 41 (2010) 237–246 Contents lists available at ScienceDirect Studies in History and Philosophy of Science journal homepage: www.elsevier.com/locate/shpsa Computation vs. information processing: why their difference matters to cognitive science Gualtiero Piccinini a, Andrea Scarantino b a Department of Philosophy, University of Missouri St. Louis, 599 Lucas Hall (MC 73), One University Boulevard, St. Louis, MO 63121-4499, USA b Department of Philosophy, Neuroscience Institute, Georgia State University, PO Box 4089, Atlanta, GA 30302-4089, USA article info abstract Keywords: Since the cognitive revolution, it has become commonplace that cognition involves both computation and Computation information processing. Is this one claim or two? Is computation the same as information processing? The Information processing two terms are often used interchangeably, but this usage masks important differences. In this paper, we Computationalism distinguish information processing from computation and examine some of their mutual relations, shed- Computational theory of mind ding light on the role each can play in a theory of cognition. We recommend that theorists of cognition be Cognitivism explicit and careful in choosing notions of computation and information and connecting them together. Ó 2010 Elsevier Ltd. All rights reserved. When citing this paper, please use the full journal title Studies in History and Philosophy of Science 1. Introduction cessing, and the manipulation of representations does more
    [Show full text]
  • What Are Computational Neuroscience and Neuroinformatics? Computational Neuroscience
    Department of Mathematical Sciences B12412: Computational Neuroscience and Neuroinformatics What are Computational Neuroscience and Neuroinformatics? Computational Neuroscience Computational Neuroscience1 is an interdisciplinary science that links the diverse fields of neu- roscience, computer science, physics and applied mathematics together. It serves as the primary theoretical method for investigating the function and mechanism of the nervous system. Com- putational neuroscience traces its historical roots to the the work of people such as Andrew Huxley, Alan Hodgkin, and David Marr. Hodgkin and Huxley's developed the voltage clamp and created the first mathematical model of the action potential. David Marr's work focused on the interactions between neurons, suggesting computational approaches to the study of how functional groups of neurons within the hippocampus and neocortex interact, store, process, and transmit information. Computational modeling of biophysically realistic neurons and dendrites began with the work of Wilfrid Rall, with the first multicompartmental model using cable theory. Computational neuroscience is distinct from psychological connectionism and theories of learning from disciplines such as machine learning,neural networks and statistical learning theory in that it emphasizes descriptions of functional and biologically realistic neurons and their physiology and dynamics. These models capture the essential features of the biological system at multiple spatial-temporal scales, from membrane currents, protein and chemical coupling to network os- cillations and learning and memory. These computational models are used to test hypotheses that can be directly verified by current or future biological experiments. Currently, the field is undergoing a rapid expansion. There are many software packages, such as NEURON, that allow rapid and systematic in silico modeling of realistic neurons.
    [Show full text]
  • Informatics: History, Theory, Concepts and Competencies
    Informatics: History, Theory, Concepts and Competencies Charlotte Seckman, PhD, RN-BC Assistant Professor, Course Director Nursing Informatics Specialist University of Maryland School of Nursing Objectives • Review informatics history, definitions, theory and concepts as it relates to the healthcare environment and role of the professional nurse. • Discuss the role of nurses as knowledge workers. • Differentiate between computer literacy versus information literacy. • Compare and contrast the informatics competencies for each level of nursing education. What is Informatics? • The term informatics was derived from the French term informatique, which means to refer to the computer milieu (Saba, 2001). • Some trace this back to a Russian document published in 1968 • Europeans first coined the term in the 1980’s • Application to various disciplines What is Informatics? • Information Science • discipline that studies information processing • Information System • hardware and software • Information • general term for Technology information processing with computers • Information • Process of turning data into Management information and knowledge • Informatics • combines the study of information processing and computers What is Nursing Informatics? Classic Definition • Nursing Informatics is a combination of computer science, information science and nursing science designed to assist in the management and processing of nursing data, information, and knowledge to support the practice of nursing and the delivery of nursing care (Graves & Corcoran,
    [Show full text]