When Data Management Meets Cosmology and Astrophysics: Some Lessons Learned from the Petasky Project

When data management meets cosmology and astrophysics: some lessons learned from the Petasky project Farouk Toumani LIMOS, CNRS, Blaise Pascal University Clermont-Ferrand, France ● Expérience de cosmologie de 4ème génération : ● Télescope de 8,4 m ● Cerro Pachon (Chili) ● Astronomie très grand champ : caméra 9,6□ ● Tout le ciel visible en 6 bandes optiques (20000□) ● Poses de 15 s, 1 visite / 3 Journées de l’interdisciplinarité,jours 10-11 Décembre, 2014, Paris, France ● 10 ans, 60 Pbytes de données 15/03/12 Emmanuel Gangler – Réunion LIMOS Petasky http://com.isima.fr/Petasky Mastodons(program!of!the!Interdisciplinary!Mission!of!CNRS! • INS2I ✦ LIMOS (UMR CNRS 6158, Clermont-Ferrand) ✦ LIRIS (UMR CNRS 5205, Lyon) ✦ LABRI (UMR CNRS 5800, Bordeaux) ✦ LIF (UMR CNRS 7279, Marseille) ✦ LIRMM (UMR CNRS 5506, Montpellier) • IN2P3 ✦ LPC (UMR CNRS 6533, Clermont-Ferrand) ✦ APC (UMR CNRS 7164, Paris) ✦ LAL (UMR CNRS 8607, Paris) ✦ Centre de Calcul de l’IN2P3/CNRS (CC-IN2P3) • INSU ✦ LAM (UMR CNRS 7326, Marseille) Petasky: scientific challenges • Management of scientific data in the fields of cosmology and astrophysics ➡ Large amount of data ➡ Complex data (e.g., images, uncertainty, multi-scales...) ➡ Heterogeneous formats ➡ Various and complex processing (images analysis, reconstruction of trajectories, ad-hoc queries and processings, …) • Scientific challenges ➡ Scalability ➡ Data integration ➡ Data analysis ➡ Visualisation • Application context : LSST project Science in an exponential world The availability of very large amounts of data and the ability to efficiently process them is changing the way we do science • Science paradigms1 1. Empirical description of natural phenomena 2. Theoretical science: models and generalization 3. Computational science: simulation of complexe phenomena to validate theories 4. Data Intensive science : collecting and analyzing large amount of data 1Jim Gray, eScience Talk at NRC-CSTB meeting Mountain View CA, 11 January 2007. From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes ➡ International Virtual Observatory Alliance (IVOA) - Web of astronomical data ➡ Sloan Digital Sky Survey (SDSS) ➡ GAIA, launched in 12/2013 and started the scientific observations in 7/2014 • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes « … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars » - Web of astronomical data ➡ Sloan Digital Sky Survey (SDSS) ➡ GAIA, launched in 12/2013 and started the scientific observations in 7/2014 • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes « … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars » - Web of astronomical data ➡ Sloan Digital Sky Survey (SDSS) ➡ GAIA, launched in 12/2013 and started the scientific observations in 7/2014 • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes « … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars » - Web of astronomical data SDSS ➡ Sloan Digital Sky Survey (SDSS) • 2.5 m Telescope, 54 CCD imager ➡ GAIA, launched in 12/2013 and started the scientific• Started observations working inin 2000 7/2014 • In 2010, a total archive of 140 TB • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes « … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars » - Web of astronomical data SDSS ➡ Sloan Digital Sky Survey (SDSS) • 2.5 m Telescope, 54 CCD imager ➡ GAIA, launched in 12/2013 and started the scientific• Started observations working inin 2000 7/2014 • In 2010, a total archive of 140 TB • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes « … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars » - Web of astronomical data SDSS ➡ Sloan Digital Sky Survey (SDSS) • 2.5 m Telescope, 54 CCD imager ➡ GAIA, launched in 12/2013 and started the scientific• Started observations working inin 2000 7/2014 • In 2010, a total archive of 140 TB • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics • Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC ➡ Supernovae Cosmology Project, 1986 - 1024x1024 CCD camera, 2 megabytes every five minutes « … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars » - Web of astronomical data SDSS ➡ Sloan Digital Sky Survey (SDSS) • 2.5 m Telescope, 54 CCD imager ➡ GAIA, launched in 12/2013 and started the scientific• Started observations working inin 2000 7/2014 • In 2010, a total archive of 140 TB • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) How much bytes… 10005 1015 péta 10004 1012 téra 10003 109 giga 10002 106 méga 10001 103 kilo How much bytes… 10005 1015 péta 10004 1012 téra A single text character 1 byte 10003 109 gigaA typewritten page 2 kilobyte s A high-resolution photograph 2 megabytes 10002 106 mégaThe complete works of Shakespeare 5 megabytes A minute of high-fidelity sound 10 megabytes 10001 103 kilo A pickup truck filled with books 1 gigabyte GB ) The contents of a DVD 17 gigabyte s A collection of the works of Beethoven 20 gigabytes 50,000 trees made into paper and printed 1 terabyte ( TB ) The print collections of the U.S. Library of 10 terabytes AllCongress U.S. academic research libraries 2 petabytes All hard disk capacity developed in 1995 20 petabytes http://searchstorage.techtarget.com/definition/How-many-bytes-for Sizes of the astronomical datasets PB TB GB MB KB 1980 1990 2000 2010 Sizes of the astronomical datasets PB TB GB MB KB 1980 1990 2000 2010 Sizes of the astronomical datasets PB TB GB MB KB 1980 1990 2000 2010 Sizes of the astronomical datasets PB TB GB MB KB 1980 1990 2000 2010 Sizes of the astronomical datasets PB TB GB MB KB 1980 1990 2000 2010 Sizes of the astronomical datasets PB TB GB MB KB 1980 1990 2000 2010 E-science evolution E-science evolution E-science evolution E-science evolution Homo FTP-GREPus E-science evolution Homo FTP-GREPus E-science evolution Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years 1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay E-science evolution Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years 1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay E-science evolution Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years Homo Numericus 1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay E-science evolution Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years Homo Numericus 1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay E-science evolution Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years Grid computing Cloud computing Homo Numericus Virtualization MapReduce New hardware NoSQL … 1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay Data-driven discovery in Astrophysics Telescopes Observatories Data-driven discovery in Astrophysics Telescopes Observatories Digitized data Data-driven discovery in Astrophysics Telescopes Observatories

Load more