When management meets and : some lessons learned from the Petasky project

Farouk Toumani LIMOS, CNRS, Blaise Pascal University Clermont-Ferrand, France

● Expérience de cosmologie de 4ème génération :

● Télescope de 8,4 m

● Cerro Pachon (Chili)

● Astronomie très grand champ : caméra 9,6□

● Tout le ciel visible en 6 bandes optiques (20000□)

● Poses de 15 s, 1 visite / 3 Journées de l’interdisciplinarité,jours 10-11 Décembre, 2014, Paris, France ● 10 ans, 60 Pbytes de données 15/03/12 Emmanuel Gangler – Réunion LIMOS Petasky http://com.isima.fr/Petasky

Mastodons(program!of!the!Interdisciplinary!Mission!of!CNRS!

• INS2I ✦ LIMOS (UMR CNRS 6158, Clermont-Ferrand) ✦ LIRIS (UMR CNRS 5205, Lyon) ✦ LABRI (UMR CNRS 5800, Bordeaux) ✦ LIF (UMR CNRS 7279, Marseille) ✦ LIRMM (UMR CNRS 5506, Montpellier) • IN2P3 ✦ LPC (UMR CNRS 6533, Clermont-Ferrand) ✦ APC (UMR CNRS 7164, Paris) ✦ LAL (UMR CNRS 8607, Paris) ✦ Centre de Calcul de l’IN2P3/CNRS (CC-IN2P3) • INSU ✦ LAM (UMR CNRS 7326, Marseille) Petasky: scientific challenges

• Management of scientific data in the fields of cosmology and astrophysics

➡ Large amount of data

➡ Complex data (e.g., images, uncertainty, multi-scales...)

➡ Heterogeneous formats

➡ Various and complex processing (images analysis, reconstruction of trajectories, ad-hoc queries and processings, …)

• Scientific challenges

➡ Scalability

➡ Visualisation

• Application context : LSST project Science in an exponential world

The availability of very large amounts of data and the ability to efficiently process them is changing the way we do science

• Science paradigms1

1. Empirical description of natural phenomena

2. Theoretical science: models and generalization

3. Computational science: simulation of complexe phenomena to validate theories

4. Data Intensive science : collecting and analyzing large amount of data

1Jim Gray, eScience Talk at NRC-CSTB meeting Mountain View CA, 11 January 2007. From to

• Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC

➡ Supernovae Cosmology Project, 1986

- 1024x1024 CCD camera, 2 megabytes every five minutes ➡ International Virtual Observatory Alliance (IVOA)

- Web of astronomical data ➡ (SDSS)

➡ GAIA, launched in 12/2013 and started the scientific observations in 7/2014 • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics

• Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC

➡ Supernovae Cosmology Project, 1986

- 1024x1024 CCD camera, 2 megabytes every five minutes Ç … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars È

- Web of astronomical data ➡ Sloan Digital Sky Survey (SDSS)

➡ GAIA, launched in 12/2013 and started the scientific observations in 7/2014 • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics

• Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC

➡ Supernovae Cosmology Project, 1986

- 1024x1024 CCD camera, 2 megabytes every five minutes Ç … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars È

- Web of astronomical data ➡ Sloan Digital Sky Survey (SDSS)

➡ GAIA, launched in 12/2013 and started the scientific observations in 7/2014 • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics

• Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC

➡ Supernovae Cosmology Project, 1986

- 1024x1024 CCD camera, 2 megabytes every five minutes Ç … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars È

- Web of astronomical data SDSS ➡ Sloan Digital Sky Survey (SDSS) • 2.5 m Telescope, 54 CCD imager ➡ GAIA, launched in 12/2013 and started the scientific• Started observations working inin 2000 7/2014 • In 2010, a total archive of 140 TB • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics

• Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC

➡ Supernovae Cosmology Project, 1986

- 1024x1024 CCD camera, 2 megabytes every five minutes Ç … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars È

- Web of astronomical data SDSS ➡ Sloan Digital Sky Survey (SDSS) • 2.5 m Telescope, 54 CCD imager ➡ GAIA, launched in 12/2013 and started the scientific• Started observations working inin 2000 7/2014 • In 2010, a total archive of 140 TB • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics

• Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC

➡ Supernovae Cosmology Project, 1986

- 1024x1024 CCD camera, 2 megabytes every five minutes Ç … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars È

- Web of astronomical data SDSS ➡ Sloan Digital Sky Survey (SDSS) • 2.5 m Telescope, 54 CCD imager ➡ GAIA, launched in 12/2013 and started the scientific• Started observations working inin 2000 7/2014 • In 2010, a total archive of 140 TB • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) From Astronomy to astroinformatics

• Modern digital detectors, CCDs, • Early use of scientific computing, numeric simulations, .. ➡ Antikythera mechanism, between 150 to 100 BC

➡ Supernovae Cosmology Project, 1986

- 1024x1024 CCD camera, 2 megabytes every five minutes Ç … a mechanical computer used for calculating lunar, solar and stellar ➡ International Virtual Observatory Alliance (IVOA) calendars È

- Web of astronomical data SDSS ➡ Sloan Digital Sky Survey (SDSS) • 2.5 m Telescope, 54 CCD imager ➡ GAIA, launched in 12/2013 and started the scientific• Started observations working inin 2000 7/2014 • In 2010, a total archive of 140 TB • A culture of sharing data ➡ Data with non-commercial value (more open than healthcare or biomedical science field) How much bytes…

10005 1015 péta

10004 1012 téra

10003 109 giga

10002 106 méga

10001 103 kilo How much bytes…

10005 1015 péta

10004 1012 téra A single text character 1 byte 10003 109 gigaA typewritten page 2 kilobyte s A high-resolution photograph 2 megabytes 10002 106 mégaThe complete works of Shakespeare 5 megabytes A minute of high-fidelity sound 10 megabytes 10001 103 kilo A pickup truck filled with books 1 gigabyte GB ) The contents of a DVD 17 gigabyte s A collection of the works of Beethoven 20 gigabytes 50,000 trees made into paper and printed 1 terabyte ( TB ) The print collections of the U.S. Library of 10 terabytes AllCongress U.S. academic research libraries 2 petabytes All hard disk capacity developed in 1995 20 petabytes http://searchstorage.techtarget.com/definition/How-many-bytes-for Sizes of the astronomical datasets

PB

TB

GB

MB

KB

1980 1990 2000 2010 Sizes of the astronomical datasets

PB

TB

GB

MB

KB

1980 1990 2000 2010 Sizes of the astronomical datasets

PB

TB

GB

MB

KB

1980 1990 2000 2010 Sizes of the astronomical datasets

PB

TB

GB

MB

KB

1980 1990 2000 2010 Sizes of the astronomical datasets

PB

TB

GB

MB

KB

1980 1990 2000 2010 Sizes of the astronomical datasets

PB

TB

GB

MB

KB

1980 1990 2000 2010 E-science evolution E-science evolution E-science evolution E-science evolution

Homo FTP-GREPus E-science evolution

Homo FTP-GREPus E-science evolution

Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years

1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay E-science evolution

Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years

1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay E-science evolution

Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years

Homo Numericus

1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay E-science evolution

Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years

Homo Numericus

1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay E-science evolution

Homo FTP-GREPus In 20041 FTP/GREP 1GB in a minute FTP/GREP 1TB in 2 days FTP/GREP 1PB in 3 years

Grid computing Cloud computing Homo Numericus Virtualization MapReduce New hardware NoSQL …

1Where The Rubber Meets the Sky Giving Access to Science Data, Jim Gray and Alex Szalay Data-driven discovery in Astrophysics

Telescopes Observatories Data-driven discovery in Astrophysics

Telescopes Observatories

Digitized data Data-driven discovery in Astrophysics

Telescopes Observatories

Digitized data

Processing pipeline Data-driven discovery in Astrophysics

Telescopes Observatories

Digitized data

Processing pipeline

Information/ knowledge Data-driven discovery in Astrophysics

ComposantsTelescopes d’unComposants SGBD d’unComposantsObservatories SGBD d’un SGBD

User/Application ComposantsUser/Application d’unUser/Application SGBD DBA DBA DBA Transaction commands TransactionDigitized commands ComposantsTransaction commands d’un SGBD Queries, Queries, Queries, DDL commands DDL commands DDL commands updates updates updates data Query QueryTransaction QueryTransactionDDL TransactionDDL DDL , Metadata, Metadata, User/Applicationcompiler compilermanager User/ApplicationcompilermanagerDBAcompiler managercompiler compiler statisticsTransaction commands statistics DBA Queries, Transaction commandsMetadata Metadata Metadata Query Query Queries,Query DDL commands updates DDL commands plan plan updatesplan Query Transaction Query DDL Transaction DDL Execution Metadata, ExecutionLogging and ExecutionLoggingConcurrency and LoggingConcurrency and Concurrency compiler manager compilerMetadata, statistics compiler manager compiler engine enginerecovery Processingenginerecoverycontrol statistics recoverycontrol control Index, file Index, file Index, file Metadata Metadata Query Query and record plan and record and record requests requests requests plan pipeline Index/file/recordExecution Index/file/recordLogging and Index/file/recordExecution Concurrency Logging and Concurrency managerengine managerrecovery Log manager controlLog Log Data, pages Data, engine LockpagestableData, recoveryLockpagestable Lockcontroltable Index, file Index, file and record metadata, metadata, metadata, Page Page indexes andPage recordindexes indexes commandsrequests commands commandsrequests Information/ Index/file/recordBuffer Buffer Index/file/recordBuffer Buffer Log Buffer Buffer manager manager knowledgemanager Log Data, pages Lock tableData, pages Lock table metadata, metadata, Page indexes Page indexes commands Storage Storage commands Storage Buffer Storage BufferStorage Storage manager managerBuffer manager Buffer manager manager

Storage DedicatedStorage Storage Storage manager managerarchives Data-driven discovery in Astrophysics

ComposantsTelescopes d’unComposants SGBD d’unComposantsObservatories SGBD d’un SGBD

User/Application ComposantsUser/Application d’unUser/Application SGBD DBA DBA DBA Transaction commands TransactionDigitized commands ComposantsTransaction commands d’un SGBD Queries, Queries, Queries, DDL commands DDL commands DDL commands updates updates updates data Query QueryTransaction QueryTransactionDDL TransactionDDL DDL Metadata, Metadata, Metadata, User/Applicationcompiler compilermanager User/ApplicationcompilermanagerDBAcompiler managercompiler compiler statisticsTransaction commands statistics statistics DBA Queries, Transaction commandsMetadata Metadata Metadata Query Query Queries,Query DDL commands updates DDL commands plan plan updatesplan Query Transaction Query DDL Transaction DDL Execution Metadata, ExecutionLogging and ExecutionLoggingConcurrency and LoggingConcurrency and Concurrency compiler manager compilerMetadata, statistics compiler manager compiler engine enginerecovery Processingenginerecoverycontrol statistics recoverycontrol control Index, file Index, file Index, file Metadata Metadata Query Query and record plan and record and record requests requests requests plan pipeline Index/file/recordExecution Index/file/recordLogging and Index/file/recordExecution Concurrency Logging and Concurrency managerengine managerrecovery Log manager controlLog Log Data, pages Data, engine LockpagestableData, recoveryLockpagestable Lockcontroltable Index, file Index, file and record metadata, metadata, metadata, Page Page indexes andPage recordindexes indexes commandsrequests commands commandsrequests Information/ Public Index/file/recordBuffer Buffer Index/file/recordBuffer Buffer Log Buffer Buffer manager manager knowledgemanager Log Data, pages Lock tableData, pages Lock table metadata, metadata, Page indexes Page indexes commands Storage Storage commands Storage Buffer Storage BufferStorage Storage manager managerBuffer manager Buffer manager manager Education

Storage DedicatedStorage Storage Storage manager managerarchives Science Data-driven discovery in Astrophysics

ComposantsTelescopes d’unComposants SGBD d’unComposantsObservatories SGBD d’un SGBD

User/Application ComposantsUser/Application d’unUser/Application SGBD DBA DBA DBA Transaction commands TransactionDigitized commands ComposantsTransaction commands d’un SGBD Queries, Queries, Queries, DDL commands DDL commands DDL commands updates updates updates data Query QueryTransaction QueryTransactionDDL TransactionDDL DDL Metadata, Metadata, Metadata, User/Applicationcompiler compilermanager User/ApplicationcompilermanagerDBAcompiler managercompiler compiler statisticsTransaction commands statistics statistics DBA Queries, Transaction commandsMetadata Metadata Metadata Query Query Queries,Query DDL commands updates DDL commands plan plan updatesplan Query Transaction Query DDL Transaction DDL Execution Metadata, ExecutionLogging and ExecutionLoggingConcurrency and LoggingConcurrency and Concurrency compiler manager compilerMetadata, statistics compiler manager compiler engine enginerecovery Processingenginerecoverycontrol statistics recoverycontrol control Index, file Index, file Index, file Metadata Metadata Query Query and record plan and record and record requests requests requests plan pipeline Index/file/recordExecution Index/file/recordLogging and Index/file/recordExecution Concurrency Logging and Concurrency managerengine managerrecovery Log manager controlLog Log Data, pages Data, engine LockpagestableData, recoveryLockpagestable Lockcontroltable Index, file Index, file and record metadata, metadata, metadata, Page Page indexes andPage recordindexes indexes commandsrequests commands commandsrequests Information/ Public Index/file/recordBuffer Buffer Index/file/recordBuffer Buffer Log Buffer Buffer manager manager knowledgemanager Log Data, pages Lock tableData, pages Lock table metadata, metadata, Page indexes Page indexes commands Storage Storage commands Storage Buffer Storage BufferStorage Storage manager managerBuffer manager Buffer manager manager Education

Storage DedicatedStorage Storage Storage manager managerarchives Science

Astrometric pipeline Outputs object positions Astrometric pipeline Outputs object positions

Spectroscopic pipeline redshifts, classification of objects, … ● Expérience de cosmologie de The LSST project4ème génération : ● Télescope de 8,4 m

Large Synoptic Survey● telescopeCerro Pachon (Chili)

● Astronomie très grand champ : A new window over the sky: Telescopecaméra 9,6of □8.4 m

● Tout le ciel visible en 6 bandes optiques (20000□)

● Poses de 15 s, 1 visite / 3 jours

● 10 ans, 60 Pbytes de données 15/03/12 Emmanuel Gangler – Réunion LIMOS The LSST PetaSky Organisation projet : PetaSky Non-proft corporation ● US : 33 partenaires ; 670 M$ ● Chili : site ● France : IN2P3 (~15 M€)

Telescope Caméra Outreach

« The data volumes […] of LSST are so large that the limitation on our ability to do science isn't the ability to collect the data, it's the ability to understand […] the data » Andrew Conolly (U. Washington) “How do you turn petabytes of data into scientific knowledge?” Kirk Borne (George Mason U.)

23/1/14 Emmanuel Gangler – Workshop Mastodons 6/40 Data management challenges in LSST

“How much the (LSST) project will tell us about our solar system, the dark energy problem and more, will depend on how well we can process the information the telescope and its camera send back to us - an estimated sum of around ten petabytes of data per year.” (Mari Silbey, Space: the frontier, http://www.smartplanet.com/blog/thinking-tech/space-the-big-data-frontier/12180) Data management challenges in LSST

“How much the (LSST) project will tell us about our solar system, the dark energy problem and more, will depend on how well we can process the information the telescope and its camera send back to us - an estimated sum of around ten petabytes of data per year.” (Mari Silbey, Space: the big data frontier, http://www.smartplanet.com/blog/thinking-tech/space-the-big-data-frontier/12180)

“Plans for sharing the data from LSST with the public are as ambitious as the telescope itself” Anyone with a computer will be able to fly through the , zooming past objects a hundred million times fainter than can be observed with the unaided eye. The LSST project will provide analysis tools to enable both students and the public to participate in the process of scientific discovery. LSST data scales Tens of thousands of billions of photometric observations over tens of billions of objects LSST data scales Tens of thousands of billions of photometric observations over tens of billions of objects

• 1-10 Millions events per night, 3 billions of sources • 16 TB each 8 hours at a rate of 540 MB/second • Images - 12,8 GB/39 per second (data rate of 330 MB/s) - 100 PB final archives of images • Transients - 1-3 PB, Relation with 100 attributes/5000 billions of tuples • Objects catalog : - Relation with 500 attributes, 40 Billions of tuples - 100-200 TB • Estimation for the end of the project • 400 000 Billions of tuples (different versions of data in addition to replication), • Raw data: 60 PB • Catalog : 15 PB • Total volume after processing : several hundred of PB LSST data scales Tens of thousands of billions of photometric observations over tens of billions of objects

• 1-10 Millions events per night, 3 billions of sources • 16 TB each 8 hours at a rate of 540 MB/second • Images - 12,8 GB/39 per second (data rate of 330 MB/s) Highest data rate in current - 100 PB final archives of images astronomical surveys: 4.3 MB/s - • Transients Sloan Digital Sky Survey (SDSS) - 1-3 PB, Relation with 100 attributes/5000 billions of tuples • Objects catalog : - Relation with 500 attributes, 40 Billions of tuples - 100-200 TB • Estimation for the end of the project • 400 000 Billions of tuples (different versions of data in addition to replication), • Raw data: 60 PB • Catalog database: 15 PB • Total volume after processing : several hundred of PB LSST data scales

LSST year 1 LSST year 10

Raw data 6 PB 60 PB

Archive 19 PB 270 PB

Disk (DAC) 16 PB 90 PB

DB 0,5 PB 5 PB (baseline)

Moore equivalent 1.2 TB 12 TB 2014 LSST data scales

LSST year 1 Table&LSST year 10 Size& #tuples& #a/ributes&

Raw data 6 PB Object' 60 PB 109'TB' 38'B' 470'

Archive 19 PB Moving'Object'270 PB 5'GB' 6'M' 100'

Disk (DAC) 16 PB Source' 90 PB 3.6'PB' 5'T' 125'

Forced'Source' 1.1'PB' 32'T' 7' DB 0,5 PB 5 PB (baseline) Difference'Image' 71'TB' 200'B' 65' Source' Moore equivalent 1.2 TB CCD'Exposure'12 TB 0.6'TB' 17'B' 45' 2014 LSST data scales

A change of scale from TB to PB

Data volumes & rates are unprecedented Relative Etendue (= A) in astronomy

320

280 Estimated Nightly Data Volume 20000 240 ) 2 200 15000 deg 2 160 GB 10000

120

Etendue (m Etendue 5000 80 All facilities assumed operating100% in one survey

40 0 Raw Catalog 0 LSST PS4 PS1 Subaru CFHT SDSS MMT DES 4m VST VISTA IR LSST Pan-STARRS 4 SDSS Steven M. Kahn

SLUO Meeting Presentation June 7, 2007 4 LSST will make tens of trillions photometric observations of tens of billions of objects 6 Queries per difficulty level Supported queries

¥ Retrieve any type of information about a single object (identified by a given objectId), including full time series. SELECT * FROM Object JOIN Source USING (objectId) WHERE objectId = 293848594; Few seconds ¥ Retrieve any type of information about a group of objects in a small area of sky, including neighborhood-type queries. SELECT * FROM Object WHERE qserv_areaSpec_circle(1.0, 35.0, 5.0/60) ≃1 hour ¥ Analysing light curves across large area. SELECT O.objectId, myFunction(S.taiMidPoint, S.psfFlux) FROM Object AS O JOIN Source AS S USING (objectId) WHERE O.varProb > 0.75 GROUP BY O.objectId; ≃1 day (24h) ¥ Analysing light curves of faint objects across large area. SELECT O.objectId, myFunction(V.taiMidPoint, FS.flux) FROM Object AS O JOIN ForcedSource AS FS ON (O.objectId = FS.objectId) JOIN Visit AS V ON (FS.visitId = V.visitId); ≃1 week

!! Queries per difficulty level Expensive/impossible queries

¥ Expensive queries Ð Find objects far away from other objects (for a large number of objects). Question: what is the largest distance we should plan to support for distance-based queries involving (a) small number of objects, (b) all objects on the sky? Ð Sliding window queries: Find all 5 arcmin x 5 arcmin regions with an object density higher than rho

¥ Impossible queries Ð Large size results ¥ Select all pairs of stars within 1 arc min of each other in the Milky Way region. ÐExpensive or hidden computation (e.g., Join) ¥ Near neighbor query on the Source or ForcedSource table ¥ Joining large tables between different LSST data releases ¥ Time series analysis of every object ¥ Cross-match with very large external catalog (e.g. LSST with SKA) ¥ Any non-spatial join on the entire catalog (Object, Source, ForcedSource) ¥ Join of Source with ForcedSource Examples of User Defined Functions

• q3c_ang2ipix(ra, dec) -- returns the ipix value at ra and dec • q3c_dist(ra1, dec1, ra2, dec2) -- returns the distance in degrees between (ra1,dec1) and (ra2,dec2) • q3c_join(ra1, dec1, ra2, dec2, radius) -- returns true if (ra1, dec1) is within radius spherical distance of (ra2, dec2). It should be used when the index on q3c_ang2ipix(ra2,dec2) is created. • q3c_ellipse_join(ra1, dec1, ra2, dec2, major, ratio, pa) -- like q3c_join, except (ra1, dec1) have to be within an ellipse with major axis major, the axis ratio ratio and the position angle pa (from north through east) • q3c_radial_query(ra, dec, center_ra, center_dec, radius) -- returns true if ra, dec is within radius degrees of center_ra, center_dec. This is the main function for cone searches. function should be used if when the index on q3c_ang2ipix(ra,dec) is created) • q3c_ellipse_query(ra, dec, center_ra, center_dec, maj_ax, axis_ratio, PA ) -- returns true if ra, dec is within the ellipse from center_ra, center_dec. The ellipse is specified by major axis, axis ratio and positional angle. function should be used if when the index on q3c_ang2ipix(ra,dec) is created) • q3c_poly_query(ra, dec, poly) -- returns true if ra, dec is within the postgresql polygon poly. • q3c_ipix2ang(ipix) -- returns a 2-array of (ra,dec) corresponding to ipix. • q3c_pixarea(ipix, bits) -- returns the area corresponding to ipix at level bits (1 is smallest, 30 is the cube face) in steradians. • q3c_ipixcenter(ra, dec, bits) -- the function returning the ipix value of the pixel center of certain depth covering the specified (ra,dec) Main thesis of this talk

How to form a new generation of scientists capable to exploit the new technologies to pursue science goals at an unprecedent scale? G.Longo 1

1Talk, workshop « New challenges in astro- and environmental informatics in the Big Data era », May, 2014, Szombathely, Main thesis of this talk

How to form a new generation of scientists capable to exploit the new technologies to pursue science goals at an unprecedent scale? G.Longo 1

It should not be up to the scientists but to the technology (data management system) to overcome the computing barriers between them and the data

1Talk, workshop « New challenges in astro- and environmental informatics in the Big Data era », May, 2014, Szombathely, Hungary Main thesis of this talk

How to form a new generation of scientists capable to exploit the new technologies to pursue science goals at an unprecedent scale? G.Longo 1

It should not be up to the scientists but to the technology (dataAnalysts management instead of Hackers system) to overcome the computing barriers between them and the data

1Talk, workshop « New challenges in astro- and environmental informatics in the Big Data era », May, 2014, Szombathely, Hungary Main thesis of this talk

How to form a new generation of scientists capable to exploit the new technologies to pursue science goals at an unprecedent scale? G.Longo 1

It should not be up to the scientists but to the technology (dataAnalysts management instead of Hackers system) to overcome the computing barriers between them and the data

1Talk, workshop « New challenges in astro- and environmental informatics in the Big Data era », May, 2014, Szombathely, Hungary Main thesis of this talk

How toThis form approach a new generationhas been ofvery scientists successful capable in to exploitbusiness the new domain technologies but in to general pursue sciencenot so goals successful at an unprecedent scalein? the scientific domain G.Longo 1

It should not be up to the scientists but to the technology (dataAnalysts management instead of Hackers system) to overcome the computing barriers between them and the data

1Talk, workshop « New challenges in astro- and environmental informatics in the Big Data era », May, 2014, Szombathely, Hungary What makes DB technology successful in business domain?

• Abstractions ✓ Relation instead of files, blocks, tablespaces, segments, extents, access path

✓ Relational algebra instead of algorithms • Declarative ✓ Express what you want not how to get it • Optimization ✓ Rather naive techniques but enough for the business world Example: query compilation

Select Name From Employé E, Département D, Projet.P Where E.empno=D.manager and E.empno=P.mgr Example: query compilation Exemple (cont.) Select Name From EmployéPlan d’exécution E, Département D, Projet.P Where E.empno=D.manager and E.empno=P.mgr

Index Nested loop (A.x=C.x)

Merge-join Index-scanIndex scan Projet C (A.x=B.x

Sort Sort

Table-scanTable-scan Employé A Table-scanTable-scan Département B Petasky: data management challenge

Techniques to build an efficient and easy to use data access system at a reasonable cost

• Specialized Hardware • Commodity machines • Programming • Querying • Ad-hoc optimization • Generic system Space of solutions and associated challenges ! Clearly beyond the capacities of centralized systems

• Distributed and parallel systems ✓ Data distribution ✓ Computation distribution ✓ Failure resilience • Storage model ✓ row store vs. column store ✓ (sophisticated) Indexes • Benefit from modern hardware

• Complexity theory and cost models ✓ Standards measures: I/O, data transfer, .. ✓ Cost of coordination Explored approaches

• Big data approaches ✓ Distributed and parallel systems

➡ MapReduce-like approaches (shared nothing architecture)

➡ Parallel DBMS (shared all thing architecture)

➡ Spatial partitioning (QSERV for LSST)

✓ Column store DBMSs

➡ Vertica, MonetDB, ... • Data integration to the rescue ➡ Declarative approach Qserv PetaSkyPetaSky Solution développée à SLAC (Stanford)

Outil d'orchestration Partitionnement : ● Parseur SQL ● Géométrique (requêtes de voisinage) − Base de méta-données ● Objet et sources dans le même nœud − Fonctions utilisateur géométriques Limite pratique sur ● Communication xrootd ● Les jointures possibles ● Backend MySQL ● La taille des réusultats PetaSkyPetaSky Les ● études sur Qserv● Agrégation des résultats Le temps de calcul

● Test de passage à grande échelle (au CC IN2P3) : – 300 nœuds physiques, 120 GB disque, 16 GB RAM – 15 Tbytes, 3000 chunks → 50 GB / nœud

● Parallélisation du déploiement : espace disque = taille DB x2

23/1/14 Emmanuel Gangler – Workshop Mastodons 13/40 ● Test des surcoûts (tables en cache) • Limitations – Instabilités de performance : ✓ Limited queries ● 150 → 300 noeuds- distance ~1 arcmin - non spatial joins ● communications, multi-threading - Non-partitionable – Temps de réponseaggregates ~proportionnel au nombre de✓ Usernœuds defined functions ✓ Load balancing ✓ Ad-hoc query rewriting

23/1/14 Emmanuel Gangler – Workshop Mastodons 14/40 F.Toumani-

✬ ✩ Mediation-basedData integration approach at the rescue (cont.)

Answers User query

Semantic Mappings

. . . View on View on View on View on Source 1 Source 2 Source 3 Source N

Answers Query

Wrapper 1 Wrapper 2 Wrapper 3 Wrapper N

Answers Query

Documents

Source 1 Source 2 Source 3 Source N

✫ ✪ 2.6. MapReduce Programming Model 27

MapReduce is a new programming model used to facilitate the development of scalable parallel computations on large server clusters [33]. MapReduce framework provides a simple programming constructs to perform a computation over an input file f through two primitives: a map and a reduce functions. It operates exclusively on key, value h i pairs and produces as output a set of key, value pairs. A map function takes as input a h i data set in form of a set of key-value pairs, and for every pair k, v of the input returns h i zero or more intermediate key-value pairs k0,v0 . The map outputs are then processed h i by reduce function. A reduce function takes as input a key-list as pair k0, list(v0) , h i where k0 is an intermediate key and list(v0) is the list of all the intermediate values to be associated with k0,andreturnsasfinalresultzeroormorekey-valuepairs k00,v00 .Several h i instantiations of the map and reduce functions can operate simultaneously. Note that while map executions do not need any coordination, a given reduce execution requires all the intermediate values associated with a same intermediate key k0 (i.e., for a given intermediate key k0,allthepairs k0,v0 produced by the different map tasks must be h i processed by the same reduce task). Map and reduce functions can be implemented using any general-purpose programming language. Typically, MapReduce programs are executed on clusters of several nodes and both their inputs and outputs are files in a distributed file system (e.g.,Mapreduce Hadoop Distributed approaches File System (HDFS)). a new programming model intended to facilitate the 2.6.1development MapReduce of Execution scalable parallel Overview computations on large server clusters

Figure 2.9: MapReduce execution Overview Mapreduce approaches

Hadoop! Hadoop++! HAIL! HadoopDB! Hive! [Dean!et!al,!04] [Di3rich!et!al,!10] [Di%rich!et!al,!12] [Abouzeid!et!al,!09] [Thusoo!et!al,!10]

Query!language procedural Procedural Procedural DeclaraGve DeclaraGve

Simple!indexes Not!supported supported Supported supported Supported

MulG!indexes Not!supported Not!supported supported Supported Supported

Complex!indexes Not!supported Not!supported Not!supported supported Supported

Storage! HDFS HDFS HAIL Classical!DBMS HDFS System

Index!choice O Manual Manual AutomaGc Manual Data Mapreduceloading (1/2) approaches

2000"

1500" Hadoop! Hadoop++! HAIL! HadoopDB! Hive! [Dean!et!al,!04] [Di3rich!et!al,!10] [Di%rich!et!al,!12] [Abouzeid!et!al,!09] [Thusoo!et!al,!10] 1000"

Minutes" 500" Query!language procedural Procedural Procedural DeclaraGve DeclaraGve 0" 250"GB" 500"GB" 1"TB" 250"GB" 500"GB" 1"TB" Simple!indexes 25"machine"Not!supported supported 50"machines"Supported supported Supported Hive"(HDFS)" global"hash" local"hash" tuning"

loading.4me.for.1.TB9.50.machines.MulG!indexes Not!supported¥ Difference"(Hive">>"Not!supported supportedHadoopDB):"300"%"for"Supported Supported

33%. 25"machines"and"200"%"for"50"machines" 39%. ¥ Hive:" Complex!indexes Not!supported ¥ Not2X"of"Ime"for"2X"of"data"volume""!supported Not!supported supported Supported

15%. ¥ 25"!"50"machines:"same"Ime" 13%. ¥ HadoopDB:" HDFS" global"hash" local"hash" tuning" Storage! HDFS ¥ HDFS2X"of"data:"90"%K120"%""supplementary""HAIL Classical!DBMS HDFS System ¥ 25"!50"machines:"25"%""gain""

Index!choice O Manual Manual AutomaGc Manual Data Mapreduceloading (1/2) approaches Data loading (2/2) 2000" 1500" Hadoop! Hadoop++! HAIL! 6%$ HadoopDB! Hive! [Dean!et!al,!04] [Di3rich!et!al,!10] [Di%rich!et!al,!12] [Abouzeid!et!al,!09] [Thusoo!et!al,!10] 1000" 30%$

Minutes" 500" Query!language procedural Procedural Procedural DeclaraGve DeclaraGve 0" 70%$ 250"GB" 500"GB" 1"TB" 250"GB" 500"GB" 1"TB" 94%$ Simple!indexes 25"machine"Not!supported supported 50"machines"Supported supported Supported Load%&me% indexing% Hive"(HDFS)" global"hash" local"hash" tuning" Load%&me% indexing% Indexing$for$1$TB$3$50$machines$$ Indexing$for$1$TB$3$50$machines$ Hive$ HadoopDB$ loading.4me.for.1.TB9.50.machines.MulG!indexes Not!supported¥ Difference"(Hive">>"Not!supported supportedHadoopDB):"300"%"for"Supported Supported 200%

% ¥ Hive:% 33%. 25"machines"and"200"%"for"50"machines" 39%. 150% ¥ Hive:" ¥ 25%!%50%machines:%gain%of%15%% Complex!indexes Not!supported ¥ Not2X"of"Ime"for"2X"of"data"volume""!supported Not!supported supported Supported 100% 15%. ¥ 25"!"50"machines:"same"Ime" 13%. 50% ¥ Index%size%(GB) HadoopDB:" HDFS" global"hash" local"hash" tuning" Storage! HDFS ¥ HDFS2X"of"data:"90"%K120"%""supplementary""HAIL Classical!DBMS HDFS System 0% 250GB% 500%GB% ¥ 25"1TB%!50"machines:"25"%""gain""2%TB%

Sourceid% objec&d% ra% decl% scienceccdexposure% Index!choice O Manual Manual AutomaGc Manual Data Mapreduceloading (1/2) approaches Data loading (2/2) 2000" 1500" Hadoop! Hadoop++! HAIL! 6%$ HadoopDB! Hive! [Dean!et!al,!04] [Di3rich!et!al,!10] [Di%rich!et!al,!12] [Abouzeid!et!al,!09] [Thusoo!et!al,!10] 1000" 30%$ Query processing Minutes" 500" Query!language procedural Procedural Procedural DeclaraGve DeclaraGve 0" Hive70%$ vs. HadoopDB (2/6) 250"GB" 500"GB" 1"TB" 250"GB" 500"GB" 1"TB" 2000" 200" 94%$ Simple!indexes 1800"25"machine"Not!supported supported 50"machines"Supported180" supported Supported 1600" Load%&me% indexing% 160"Load%&me% indexing% Hive"(HDFS)"1400" global"hash" local"hash" tuning" 140" Indexing$for$1$TB$3$50$machines$$ Indexing$for$1$TB$3$50$machines$ 1200" 120" Hive$ HadoopDB$ 1000" 100" loading.4me.for.1.TB9.50.machines. MulG!indexes 800"Not!supported¥ Difference"(Hive">>"Not!supported supportedHadoopDB80" ):"300"%"for"Supported Supported

200%Time"(seconds)" 600" Time"(seconds)" 60"

% ¥ Hive:% 33%. 400" 25"machines"and"200"%"for"50"machines" 39%. 40" 150%200" ¥ ¥ 25%20" !%50%machines:%gain%of%15%% 0" Hive:" 0" Complex!indexes NotHadoopDB"!supportedHive" HadoopDB"¥ Not2X"of"Ime"for"2X"of"data"volume""!supportedHive" HadoopDB" NotHive"!supported Hive" supportedHadoopDB" Hive" HadoopDB"SupportedHive" HadoopDB" 100% 250"GB" 500"GB" 1"TB" 250"GB" 500"GB" 1"TB" 15%. ¥ 25"!"50"machines:"same"Ime" 13%. Q1" Q2" Q3" Q4" 50% ¥ Q1" Q2" Q3"(RA)" Q4" Index%size%(GB) HadoopDB:" HDFS" global"hash" local"hash" tuning" Storage! HDFS ¥ HDFS2X"of"data:"90"%K120"%""supplementary""HAIL Classical!DBMS HDFS System 0% 250GB% 500%GB% ¥ 25"1TB%!50"machines:"25"%""gain""2%TB%

Sourceid% objec&d% ra% decl% scienceccdexposure% Index!choice O Manual Manual AutomaGc Manual Physical storage Row store vs. column store

Row$Oriented$Database$ Column$Oriented$Database$

• High cost of I/O • Analytical queries are expensive • Cost of join • Problem with null values Hybrid storage

T

T1 T2

Hybrid Database Hybrid storage

Column1( ORACLE( MonetDB( Hybrid( Q1( 00:05:00,00(T 00:00:46.00( 00:06:00,00(

Q2( 00:04:42,00( 00:00:00.12( 00:02:28,00( Q3( 00:04:15,00( 00:00:00.03(( 00:00:01,00( Q4(T1 00:04:04,00( T2 00:00:00.90( 00:03:40,00( Q5( 00:00:02,19( 00:00:42.00( 00:05:06,00( Q6( 00:00:02,50( 00:00:03,00( 00:39:00,00(

Q7( 00:00:18,14( 00:00:6.3( 00:00:18,00( Q8( 00:07:38,00( 00:00:39.2( 01:40:00,00( Q9( 00:01:39,00( 00:00:45.00( Q10( 00:04:03,00( (00:08:13,00( 00:00:09,00(

Hybrid Database Learned lessons and research directions

No one fits all • MapReduce-based algorithms can be useful to implement physical operators • Hybrid system: row/column store • Need for more research on ✓ Abstraction adequate to the scientific domain

- Array data model (SCiDB)?

✓ Support of user defined functions

✓ Optimization techniques embedded in the data management system

✓ Scalability of information integration framework /learning/statistics Scientists

Foundational approach to build scientific data management system

Data management Visualisation Working cross disciplines … working cross cultures

Data mining/learning/statistics Scientists

Foundational approach to build scientific data management system

Data management Visualisation Merci