Collaborative -driven science

Gerard Lemson Alex Szalay, Mike Rippin DIBBS/SciServer Collaborative data-driven science

} Started with the SDSS SkyServer } Built in a few months in 2001 } Goal: instant access to rich content } Idea: bring the analysis to the data } Interacve access at the core } Much of the scienfic process is about data ◦ Data collecon, data cleaning, data archiving, data organizaon, data publishing, mirroring, data distribuon, data analycs, data curaon…

2 Collaborative data-driven science

Form Based Queries

3 Collaborative data-driven science

Image Access Collaborative data-driven science

Custom SQL Collaborative data-driven science

Batch Queries, MyDB Collaborative data-driven science

Cosmological Simulations Collaborative data-driven science

Turbulence Database Collaborative data-driven science

Web Service Access through Python Collaborative data-driven science

} Interacve science on petascale data } Sustain and enhance our effort } Create scalable open numerical laboratories } Scale system to many petabytes } Deep integraon with the “Long Tail” } Large footprint across many disciplines ◦ Also: Genomics, Oceanography, Materials Science } Use commonly shared building blocks } Major naonal and internaonal impact

10 Collaborative data-driven science

} Offer more compung resources server side } Augment and combine SQL queries with easy- to-use scripng tools } Heavy use of virtual machines } Interacve portal via iPython/Matlab/R } Batch jobs } Enhanced visualizaon tools

11 Collaborative data-driven science

} CasJobs ◦ SQL, MyDB, batch ◦ FileDB: Raw data access from within RDB } SciDrive ◦ Dropbox-like, on-drop event handling } SciServer/compute ◦ Interacve/batch python, R, Matlab in Docker container } MyScratch (File & DB) } SSO on all components } All published through REST

12 Collaborative data-driven science

MyScratch Files Login Portal SkyServer MyScratch DB REST API SciDrive OpenStack REST API SciScript Turbulence Keystone & Swift REST API REST API

Cosmology WEB UI CasJobs UI Client

REST CasJobs Job GLUSEEN BatchAdmin Scheduler Service API WS Client

SkyQuery USNOB IRAS DR7 DR8 GLUSEEN DR10 FIRST ROSAT DR5 DR6 Parallel X-Match Engine 2MASS Galex DR3 DR4 SkyQuery Scheduler DR9 SDSS WISE DR1 DR2 Turbulence

SkyNode REST Registry SDSS DB Misc. DB Servers API MyDB Server Servers Servers

Linked Server Connections 13 Collaborative data-driven science

} Jupyter Notebooks in Docker ◦ hp://www..com/news/interacve-notebooks-sharing-the- code-1.16261 ◦ hps://developer.rackspace.com/blog/how-did-we-serve-more- than-20000-ipython-notebooks-for-nature/ } Python, R, Matlab } Flexible way to aach data sets in volume containers } Extended to batch jobs

14 Collaborative data-driven science

15 Collaborative data-driven science

Astronomy Collaborative data-driven science Collaborative data-driven science Collaborative data-driven science

Materials Science Collaborative data-driven science

Materials Science

20 Collaborative data-driven science

Turbulence Collaborative data-driven science

Genomics Collaborative data-driven science

23 Collaborative data-driven science

I’ll be very happy to demo and discuss our services

24