When Data Management Meets Cosmology and Astrophysics: Some Lessons Learned from the Petasky Project

Total Page:16

File Type:pdf, Size:1020Kb

When Data Management Meets Cosmology and Astrophysics: Some Lessons Learned from the Petasky Project When data management meets cosmology and astrophysics: some lessons learned from the Petasky project Farouk Toumani LIMOS, CNRS, Blaise Pascal University Clermont-Ferrand, France ● Expérience de cosmologie de 4ème génération : ● Télescope de 8,4 m ● Cerro Pachon (Chili) ● Astronomie très grand champ : caméra 9,6□ ● Tout le ciel visible en 6 bandes optiques (20000□) ● Poses de 15 s, 1 visite / 3 Journées de l’interdisciplinarité,jours 10-11 Décembre, 2014, Paris, France ● 10 ans, 60 Pbytes de données 15/03/12 Emmanuel Gangler – Réunion LIMOS Petasky http://com.isima.fr/Petasky Mastodons(program!of!the!Interdisciplinary!Mission!of!CNRS! • INS2I ✦ LIMOS (UMR CNRS 6158, Clermont-Ferrand) ✦ LIRIS (UMR CNRS 5205, Lyon) ✦ LABRI (UMR CNRS 5800, Bordeaux) ✦ LIF (UMR CNRS 7279, Marseille) ✦ LIRMM (UMR CNRS 5506, Montpellier) • IN2P3 ✦ LPC (UMR CNRS 6533, Clermont-Ferrand) ✦ APC (UMR CNRS 7164, Paris) ✦ LAL (UMR CNRS 8607, Paris) ✦ Centre de Calcul de l’IN2P3/CNRS (CC-IN2P3) • INSU ✦ LAM (UMR CNRS 7326, Marseille) Petasky: scientific challenges • Management of scientific data in the fields of cosmology and astrophysics ➡ Large amount of data ➡ Complex data (e.g., images, uncertainty, multi-scales...) ➡ Heterogeneous formats ➡ Various and complex processing (images analysis, reconstruction of trajectories, ad-hoc queries and processings, …) • Scientific challenges ➡ Scalability ➡ Data integration ➡ Data analysis ➡ Visualisation • Application context : LSST project Science in an exponential world The availability of very large amounts of data and the ability to efficiently process them is changing the way we do science • Science paradigms1 1. Empirical description of natural phenomena 2. Theoretical science: models and generalization 3. Computational science: simulation of complexe phenomena to validate theories 4. Data Intensive science : collecting and analyzing large amount of data 1Jim Gray, eScience Talk at NRC-CSTB meeting Mountain View CA, 11 January 2007. From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes ➡ International Virtual Observatory Alliance (IVOA) - Web of astronomical data ➡ Sloan Digital Sky Survey (SDSS) ➡ GAIA, launched in 12/2013 and started the scientific observations in 7/2014 • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes « … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars » - Web of astronomical data ➡ Sloan Digital Sky Survey (SDSS) ➡ GAIA, launched in 12/2013 and started the scientific observations in 7/2014 • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes « … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars » - Web of astronomical data ➡ Sloan Digital Sky Survey (SDSS) ➡ GAIA, launched in 12/2013 and started the scientific observations in 7/2014 • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes « … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars » - Web of astronomical data SDSS ➡ Sloan Digital Sky Survey (SDSS) • 2.5 m Telescope, 54 CCD imager ➡ GAIA, launched in 12/2013 and started the scientific• Started observations working inin 2000 7/2014 • In 2010, a total archive of 140 TB • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes « … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars » - Web of astronomical data SDSS ➡ Sloan Digital Sky Survey (SDSS) • 2.5 m Telescope, 54 CCD imager ➡ GAIA, launched in 12/2013 and started the scientific• Started observations working inin 2000 7/2014 • In 2010, a total archive of 140 TB • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes « … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars » - Web of astronomical data SDSS ➡ Sloan Digital Sky Survey (SDSS) • 2.5 m Telescope, 54 CCD imager ➡ GAIA, launched in 12/2013 and started the scientific• Started observations working inin 2000 7/2014 • In 2010, a total archive of 140 TB • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes « … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars » - Web of astronomical data SDSS ➡ Sloan Digital Sky Survey (SDSS) • 2.5 m Telescope, 54 CCD imager ➡ GAIA, launched in 12/2013 and started the scientific• Started observations working inin 2000 7/2014 • In 2010, a total archive of 140 TB • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) How much bytes… 10005 1015 péta 10004 1012 téra 10003 109 giga 10002 106 méga 10001 103 kilo How much bytes… 10005 1015 péta 10004 1012 téra A single text character 1 byte 10003 109 gigaA typewritten page 2 kilobyte s A high-resolution photograph 2 megabytes 10002 106 mégaThe complete works of Shakespeare 5 megabytes A minute of high-fidelity sound 10 megabytes 10001 103 kilo A pickup truck filled with books 1 gigabyte GB ) The contents of a DVD 17 gigabyte s A collection of the works of Beethoven 20 gigabytes 50,000 trees made into paper and printed 1 terabyte ( TB ) The print collections of the U.S. Library of 10 terabytes AllCongress U.S. academic research libraries 2 petabytes All hard disk capacity developed in 1995 20 petabytes http://searchstorage.techtarget.com/definition/How-many-bytes-for Sizes of the astronomical datasets PB TB GB MB KB 1980 1990 2000 2010 Sizes of the astronomical datasets PB TB GB MB KB 1980 1990 2000 2010 Sizes of the astronomical datasets PB TB GB MB KB 1980 1990 2000 2010 Sizes of the astronomical datasets PB TB GB MB KB 1980 1990 2000 2010 Sizes of the astronomical datasets PB TB GB MB KB 1980 1990 2000 2010 Sizes of the astronomical datasets PB TB GB MB KB 1980 1990 2000 2010 E-science evolution E-science evolution E-science evolution E-science evolution Homo FTP-GREPus E-science evolution Homo FTP-GREPus E-science evolution Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years 1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay E-science evolution Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years 1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay E-science evolution Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years Homo Numericus 1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay E-science evolution Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years Homo Numericus 1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay E-science evolution Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years Grid computing Cloud computing Homo Numericus Virtualization MapReduce New hardware NoSQL … 1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay Data-driven discovery in Astrophysics Telescopes Observatories Data-driven discovery in Astrophysics Telescopes Observatories Digitized data Data-driven discovery in Astrophysics Telescopes Observatories
Recommended publications
  • Astrophysics with Terabytes
    Astrophysics with Terabytes Alex Szalay The Johns Hopkins University Jim Gray Microsoft Research Living in an Exponential World • Astronomers have a few hundred TB now – 1 pixel (byte) / sq arc second ~ 4TB – Multi-spectral, temporal, … _ 1PB • They mine it looking for 1000 new (kinds of) objects or more of interesting ones (quasars), 100 density variations in 400-D space 10 correlations in 400-D space 1 • Data doubles every year 0.1 2000 1995 1990 1985 1980 1975 1970 CCDs Glass The Challenges Exponential data growth: Distributed collections Soon Petabytes Data Collection Discovery Publishing and Analysis New analysis paradigm: New publishing paradigm: Data federations, Scientists are publishers Move analysis to data and Curators Why Is Astronomy Special? • Especially attractive for the wide public • Community is not very large • It has no commercial value – No privacy concerns, freely share results with others – Great for experimenting with algorithms • It is real and well documented – High-dimensional (with confidence intervals) – Spatial, temporal • Diverse and distributed – Many different instruments from many different places and many different times • The questions are interesting • There is a lot of it (soon petabytes) The Virtual Observatory • Premise: most data is (or could be online) • The Internet is the world’s best telescope: – It has data on every part of the sky – In every measured spectral band: optical, x-ray, radio.. – As deep as the best instruments (2 years ago). – It is up when you are up – The “seeing” is always great – It’s a smart telescope: links objects and data to literature on them • Software became the capital expense – Share, standardize, reuse.
    [Show full text]
  • Scientific Data Mining in Astronomy
    SCIENTIFIC DATA MINING IN ASTRONOMY Kirk D. Borne Department of Computational and Data Sciences, George Mason University, Fairfax, VA 22030, USA [email protected] Abstract We describe the application of data mining algorithms to research prob- lems in astronomy. We posit that data mining has always been fundamen- tal to astronomical research, since data mining is the basis of evidence- based discovery, including classification, clustering, and novelty discov- ery. These algorithms represent a major set of computational tools for discovery in large databases, which will be increasingly essential in the era of data-intensive astronomy. Historical examples of data mining in astronomy are reviewed, followed by a discussion of one of the largest data-producing projects anticipated for the coming decade: the Large Synoptic Survey Telescope (LSST). To facilitate data-driven discoveries in astronomy, we envision a new data-oriented research paradigm for astron- omy and astrophysics – astroinformatics. Astroinformatics is described as both a research approach and an educational imperative for modern data-intensive astronomy. An important application area for large time- domain sky surveys (such as LSST) is the rapid identification, charac- terization, and classification of real-time sky events (including moving objects, photometrically variable objects, and the appearance of tran- sients). We describe one possible implementation of a classification broker for such events, which incorporates several astroinformatics techniques: user annotation, semantic tagging, metadata markup, heterogeneous data integration, and distributed data mining. Examples of these types of collaborative classification and discovery approaches within other science disciplines are presented. arXiv:0911.0505v1 [astro-ph.IM] 3 Nov 2009 1 Introduction It has been said that astronomers have been doing data mining for centuries: “the data are mine, and you cannot have them!”.
    [Show full text]
  • Things You Need to Know About Big Data
    Critical Missions Are Built On NetApp NetApp’s Big Data solutions deliver high performance computing, full motion video (FMV) and intelligence, surveillance and reconnaissance (ISR) capabilities to support national safety, defense and intelligence missions and scientific research. UNDERWRITTEN BY To learn more about how NetApp and VetDS can solve your big data challenges, call us at 919.238.4715 or visit us online at VetDS.com. ©2011 NetApp. All rights reserved. Specifications are subject to change without notice. NetApp, the NetApp logo, and Go further, faster are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such. Critical Missions Are Things You NeedBuilt On NetApp to Know About Big Data NetApp’s Big Data solutions deliver high Big data has been making big news in fields rangingperformance from astronomy computing, full motionto online video advertising. (FMV) and intelligence, surveillance and By Joseph Marks The term big data can be difficult to pin down because it reconnaissanceshows up in (ISR) so capabilities many toplaces. support Facebook national safety, defense and intelligence crunches through big data on your user profile and friend networkmissions and to scientific deliver research. micro-targeted ads. Google does the same thing with Gmail messages, Search requests and YouTube browsing. Companies including IBM are sifting through big data from satellites, the Global Positioning System and computer networks to cut down on traffic jams and reduce carbon emissions in cities. And researchers are parsing big data produced by the Hubble Space Telescope, the Large Hadron Collider and numerous other sources to learn more about the nature and origins of the universe.
    [Show full text]
  • Data Quality Through Data Integration: How Integrating Your IDEA Data Will Help Improve Data Quality
    center for the integration of IDEA Data Data Quality Through Data Integration: How Integrating Your IDEA Data Will Help Improve Data Quality July 2018 Authors: Sara Sinani & Fred Edora High-quality data is essential when looking at student-level data, including data specifically focused on students with disabilities. For state education agencies (SEAs), it is critical to have a solid foundation for how data is collected and stored to achieve high-quality data. The process of integrating the Individuals with Disabilities Education Act (IDEA) data into a statewide longitudinal data system (SLDS) or other general education data system not only provides SEAs with more complete data, but also helps SEAs improve accuracy of federal reporting, increase the quality of and access to data within and across data systems, and make better informed policy decisions related to students with disabilities. Through the data integration process, including mapping data elements, reviewing data governance processes, and documenting business rules, SEAs will have developed documented processes and policies that result in more integral data that can be used with more confidence. In this brief, the Center for the Integration of IDEA Data (CIID) provides scenarios based on the continuum of data integration by focusing on three specific scenarios along the integration continuum to illustrate how a robust integrated data system improves the quality of data. Scenario A: Siloed Data Systems Data systems where the data is housed in separate systems and State Education Agency databases are often referred to as “data silos.”1 These data silos are isolated from other data systems and databases.
    [Show full text]
  • Managing Data in Motion This Page Intentionally Left Blank Managing Data in Motion Data Integration Best Practice Techniques and Technologies
    Managing Data in Motion This page intentionally left blank Managing Data in Motion Data Integration Best Practice Techniques and Technologies April Reeve AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier Acquiring Editor: Andrea Dierna Development Editor: Heather Scherer Project Manager: Mohanambal Natarajan Designer: Russell Purdy Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA Copyright r 2013 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
    [Show full text]
  • Informatica Cloud Data Integration
    Informatica® Cloud Data Integration Microsoft SQL Server Connector Guide Informatica Cloud Data Integration Microsoft SQL Server Connector Guide March 2019 © Copyright Informatica LLC 2017, 2019 This software and documentation are provided only under a separate license agreement containing restrictions on use and disclosure. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC. U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation is subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License. Informatica, the Informatica logo, Informatica Cloud, and PowerCenter are trademarks or registered trademarks of Informatica LLC in the United States and many jurisdictions throughout the world. A current list of Informatica trademarks is available on the web at https://www.informatica.com/trademarks.html. Other company and product names may be trade names or trademarks of their respective owners. Portions of this software and/or documentation are subject to copyright held by third parties. Required third party notices are included with the product. See patents at https://www.informatica.com/legal/patents.html. DISCLAIMER: Informatica LLC provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of noninfringement, merchantability, or use for a particular purpose.
    [Show full text]
  • Gerard Lemson Alex Szalay, Mike Rippin DIBBS/Sciserver Collaborative Data-Driven Science
    Collaborative data-driven science Gerard Lemson Alex Szalay, Mike Rippin DIBBS/SciServer Collaborative data-driven science } Started with the SDSS SkyServer } Built in a few months in 2001 } Goal: instant access to rich content } Idea: bring the analysis to the data } Interac@ve access at the core } Much of the scien@fic process is about data ◦ Data collec@on, data cleaning, data archiving, data organizaon, data publishing, mirroring, data distribu@on, data analy@cs, data curaon… 2 Collaborative data-driven science Form Based Queries 3 Collaborative data-driven science Image Access Collaborative data-driven science Custom SQL Collaborative data-driven science Batch Queries, MyDB Collaborative data-driven science Cosmological Simulations Collaborative data-driven science Turbulence Database Collaborative data-driven science Web Service Access through Python Collaborative data-driven science } Interac@ve science on petascale data } Sustain and enhance our astronomy effort } Create scalable open numerical laboratories } Scale system to many petabytes } Deep integraon with the “Long Tail” } Large footprint across many disciplines ◦ Also: Genomics, Oceanography, Materials Science } Use commonly shared building blocks } Major naonal and internaonal impact 10 Collaborative data-driven science } Offer more compung resources server side } Augment and combine SQL queries with easy- to-use scrip@ng tools } Heavy use of virtual machines } Interac@ve portal via iPython/Matlab/R } Batch jobs } Enhanced visualizaon tools 11 Collaborative data-driven science
    [Show full text]
  • What to Look for When Selecting a Master Data Management Solution What to Look for When Selecting a Master Data Management Solution
    What to Look for When Selecting a Master Data Management Solution What to Look for When Selecting a Master Data Management Solution Table of Contents Business Drivers of MDM .......................................................................................................... 3 Next-Generation MDM .............................................................................................................. 5 Keys to a Successful MDM Deployment .............................................................................. 6 Choosing the Right Solution for MDM ................................................................................ 7 Talend’s MDM Solution .............................................................................................................. 7 Conclusion ..................................................................................................................................... 8 Master data management (MDM) exists to enable the business success of an orga- nization. With the right MDM solution, organizations can successfully manage and leverage their most valuable shared information assets. Because of the strategic importance of MDM, organizations can’t afford to make any mistakes when choosing their MDM solution. This article will discuss what to look for in an MDM solution and why the right choice will help to drive strategic business initiatives, from cloud computing to big data to next-generation business agility. 2 It is critical for CIOs and business decision-makers to understand the value
    [Show full text]
  • Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets Year 1 Progress Report & Year 2 Proposal
    January 27, 2007 NASA GSRP Proposal Page 1 of 5 Ioan Raicu Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets Year 1 Progress Report & Year 2 Proposal 1 Year 1 Proposal In order to setup the context for this progress report, this section covers a brief motivation for our work and summarizes the Year 1 Proposal we originally submitted under grant number NNA06CB89H. Large datasets are being produced at a very fast pace in the astronomy domain. In principle, these datasets are most valuable if and only if they are made available to the entire community, which may have tens to thousands of members. The astronomy community will generally want to perform various analyses on these datasets to be able to extract new science and knowledge that will both justify the investment in the original acquisition of the datasets as well as provide a building block for other scientists and communities to build upon to further the general quest for knowledge. Grid Computing has emerged as an important new field focusing on large-scale resource sharing and high- performance orientation. The Globus Toolkit, the “de facto standard” in Grid Computing, offers us much of the needed middleware infrastructure that is required to realize large scale distributed systems. We proposed to develop a collection of Web Services-based systems that use grid computing to federate large computing and storage resources for dynamic analysis of large datasets. We proposed to build a Globus Toolkit 4 based prototype named the “AstroPortal” that would support the “stacking” analysis on the Sloan Digital Sky Survey (SDSS).
    [Show full text]
  • Using ETL, EAI, and EII Tools to Create an Integrated Enterprise
    Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise Colin White Founder, BI Research TDWI Webcast October 2005 TDWI Data Integration Study Copyright © BI Research 2005 2 Data Integration: Barrier to Application Development Copyright © BI Research 2005 3 Top Three Data Integration Inhibitors Copyright © BI Research 2005 4 Staffing and Budget for Data Integration Copyright © BI Research 2005 5 Data Integration: A Definition A framework of applications, products, techniques and technologies for providing a unified and consistent view of enterprise-wide business data Copyright © BI Research 2005 6 Enterprise Business Data Copyright © BI Research 2005 7 Data Integration Architecture Source Target Data integration Master data applications Business domain dispersed management (MDM) MDM applications integrated internal data & external Data integration techniques data Data Data Data propagation consolidation federation Changed data Data transformation (restructure, capture (CDC) cleanse, reconcile, aggregate) Data integration technologies Enterprise data Extract transformation Enterprise content replication (EDR) load (ETL) management (ECM) Enterprise application Right-time ETL Enterprise information integration (EAI) (RT-ETL) integration (EII) Web services (services-oriented architecture, SOA) Data integration management Data quality Metadata Systems management management management Copyright © BI Research 2005 8 Data Integration Techniques and Technologies Data Consolidation centralized data Extract, transformation
    [Show full text]
  • Warmth Elevating the Depths: Shallower Voids with Warm Dark Matter
    MNRAS 451, 3606–3614 (2015) doi:10.1093/mnras/stv1087 Warmth elevating the depths: shallower voids with warm dark matter Lin F. Yang,1‹ Mark C. Neyrinck,1 Miguel A. Aragon-Calvo,´ 2 Bridget Falck3 and Joseph Silk1,4,5 1Department of Physics & Astronomy, The Johns Hopkins University, 3400 N Charles Street, Baltimore, MD 21218, USA 2Department of Physics and Astronomy, University of California, Riverside, CA 92521, USA 3Institute of Cosmology and Gravitation, University of Portsmouth, Dennis Sciama Building, Burnaby Rd, Portsmouth PO1 3FX, UK 4Institut d’Astrophysique de Paris – 98 bis boulevard Arago, F-75014 Paris, France 5Beecroft Institute of Particle Astrophysics and Cosmology, Department of Physics, University of Oxford, Denys Wilkinson Building, 1 Keble Road, Oxford OX1 3RH, UK Downloaded from https://academic.oup.com/mnras/article/451/4/3606/1101530 by guest on 28 September 2021 Accepted 2015 May 12. Received 2015 May 1; in original form 2014 December 11 ABSTRACT Warm dark matter (WDM) has been proposed as an alternative to cold dark matter (CDM), to resolve issues such as the apparent lack of satellites around the Milky Way. Even if WDM is not the answer to observational issues, it is essential to constrain the nature of the dark matter. The effect of WDM on haloes has been extensively studied, but the small-scale initial smoothing in WDM also affects the present-day cosmic web and voids. It suppresses the cosmic ‘sub- web’ inside voids, and the formation of both void haloes and subvoids. In N-body simulations run with different assumed WDM masses, we identify voids with the ZOBOV algorithm, and cosmic-web components with the ORIGAMI algorithm.
    [Show full text]
  • Introduction to Data Integration Driven by a Common Data Model
    Introduction to Data Integration Driven by a Common Data Model Michal Džmuráň Senior Business Consultant Progress Software Czech Republic Index Data Integration Driven by a Common Data Model 3 Data Integration Engine 3 What Is Semantic Integration? 3 What Is Common Data Model? Common Data Model Examples 4 What Is Data Integration Driven by a Common Data Model? 5 The Position of Integration Driven by a Common Data Model in the Overall Application Integration Architecture 6 Which Organisations Would Benefit from Data Integration Driven by a Common Data Model? 8 Key Elements of Data Integration Driven by a Common Data Model 9 Common Data Model, Data Services and Data Sources 9 Mapping and Computed Attributes 10 Rules 11 Glossary of Terms 13 Information Resources 15 Web 15 Articles and Analytical Reports 15 Literature 15 Industry Standards for Common Data Models 15 Contact 16 References 16 Data Integration Driven by a Common Data Model Data Integration Engine Today, not even small organisations can make do with a single application. The majority of business processes in an organisation is nowadays already supported by some kind of implemented application, and the organisation must focus on making its operations more efficient. One of the ways of achieving this goal is the information exchange optimization across applications, assisted by the data integration from various applications; it is summarily known as Enterprise Application Integration (EAI). Without effective Enterprise Application Integration, a modern organisation cannot run its business processes to meet the ever increasing customer demands and have an up-to-date knowledge of its operations. Data integration is an important part of EAI.
    [Show full text]