Data Management Organization Charter
Total Page:16
File Type:pdf, Size:1020Kb
Large Synoptic Survey Telescope (LSST) Database Design Jacek Becla, Daniel Wang, Serge Monkewitz, K-T Lim, Douglas Smith, Bill Chickering LDM-135 08/02/2013 LSST Database Design LDM-135 08/02/13 Change Record Version Date Description Revision Author 1.0 6/15/2009 Initial version Jacek Becla 2.0 7/12/2011 Most sections rewritten, added scalability test Jacek Becla section 2.1 8/12/2011 Refreshed future-plans and schedule of testing Jacek Becla, sections, added section about fault tolerance Daniel Wang 3.0 8/2/2013 Synchronized with latest changes to the Jacek Becla, requirements (LSE-163). Rewrote most of the Daniel Wang, “Implementation” chapter. Documented new Serge Monkewitz, tests, refreshed all other chapters. Kian-Tat Lim, Douglas Smith, Bill Chickering 2 LSST Database Design LDM-135 08/02/13 Table of Contents 1. Executive Summary.....................................................................................................................8 2. Introduction..................................................................................................................................9 3. Baseline Architecture.................................................................................................................10 3.1 Alert Production and Up-to-date Catalog..........................................................................10 3.2 Data Release Production....................................................................................................13 3.3 User Query Access.............................................................................................................13 3.3.1 Distributed and parallel.............................................................................................14 3.3.2 Shared-nothing..........................................................................................................14 3.3.3 Indexing....................................................................................................................15 3.3.4 Shared scanning........................................................................................................15 3.3.5 Clustering..................................................................................................................16 3.3.6 Partitioning................................................................................................................17 3.3.7 Technology choice....................................................................................................19 4. Requirements.............................................................................................................................20 4.1 General Requirements........................................................................................................20 4.2 Data Production Related Requirements.............................................................................21 4.3 Query Access Related Requirements.................................................................................21 4.4 Discussion..........................................................................................................................23 4.4.1 Implications...............................................................................................................23 4.4.2 Query complexity and access patterns......................................................................24 5. Potential Solutions - Research...................................................................................................25 5.1 The Research......................................................................................................................25 5.2 The Results.........................................................................................................................25 5.3 Map/Reduce-based and NoSQL Solutions........................................................................26 5.4 DBMS Solutions................................................................................................................27 5.4.1 Parallel DBMSes.......................................................................................................27 5.4.2 Object-oriented solutions..........................................................................................30 5.4.3 Row-based vs columnar stores..................................................................................30 5.4.4 Appliances.................................................................................................................32 5.5 Comparison and Discussion ..............................................................................................32 6. Design Trade-offs......................................................................................................................36 6.1 Standalone Tests................................................................................................................37 6.1.1 Spatial join performance...........................................................................................37 3 LSST Database Design LDM-135 08/02/13 6.1.2 Building sub-partitions..............................................................................................37 6.1.3 Sub-partition overhead..............................................................................................38 6.1.4 Avoiding materializing sub-partitions......................................................................38 6.1.5 Billion row table / reference catalog.........................................................................38 6.1.6 Compression.............................................................................................................39 6.1.7 Full table scan performance......................................................................................39 6.1.8 Low-volume queries.................................................................................................39 6.1.9 Solid state disks.........................................................................................................40 6.2 Data Challenge Related Tests............................................................................................41 6.2.1 DC1: data ingest........................................................................................................41 6.2.2 DC2: source/object association.................................................................................41 6.2.3 DC3: catalog construction.........................................................................................41 6.2.4 Winter-2013 Data Challenge: querying database for forced photometry.................42 6.2.5 Winter-2013 Data Challenge: partitioning 2.6 TB table for Qserv..........................42 6.2.6 Winter-2013 Data Challenge: multi-billion-row table..............................................42 7. Risk Analysis.............................................................................................................................43 7.1 Potential Key Risks............................................................................................................43 7.2 Risks Mitigations...............................................................................................................45 8. Implementation of the Query Service (Qserv) Prototype..........................................................46 8.1 Components.......................................................................................................................46 8.1.1 MySQL.....................................................................................................................46 8.1.2 XRootD.....................................................................................................................46 8.2 Partitioning.........................................................................................................................47 8.3 Query Generation...............................................................................................................48 8.3.1 Processing modules...................................................................................................48 8.3.2 Processing module overview....................................................................................49 8.4 Dispatch.............................................................................................................................50 8.4.1 Wire protocol............................................................................................................50 8.4.2 Frontend....................................................................................................................50 8.4.3 Worker......................................................................................................................51 8.5 Threading Model................................................................................................................51 8.6 Aggregation........................................................................................................................52 8.7 Indexing.............................................................................................................................53 8.8 Data Distribution................................................................................................................53 8.8.1 Database data distribution.........................................................................................53