Data Management Organization Charter

Total Page:16

File Type:pdf, Size:1020Kb

Data Management Organization Charter Large Synoptic Survey Telescope (LSST) Database Design Jacek Becla, Daniel Wang, Serge Monkewitz, K-T Lim, Douglas Smith, Bill Chickering LDM-135 08/02/2013 LSST Database Design LDM-135 08/02/13 Change Record Version Date Description Revision Author 1.0 6/15/2009 Initial version Jacek Becla 2.0 7/12/2011 Most sections rewritten, added scalability test Jacek Becla section 2.1 8/12/2011 Refreshed future-plans and schedule of testing Jacek Becla, sections, added section about fault tolerance Daniel Wang 3.0 8/2/2013 Synchronized with latest changes to the Jacek Becla, requirements (LSE-163). Rewrote most of the Daniel Wang, “Implementation” chapter. Documented new Serge Monkewitz, tests, refreshed all other chapters. Kian-Tat Lim, Douglas Smith, Bill Chickering 2 LSST Database Design LDM-135 08/02/13 Table of Contents 1. Executive Summary.....................................................................................................................8 2. Introduction..................................................................................................................................9 3. Baseline Architecture.................................................................................................................10 3.1 Alert Production and Up-to-date Catalog..........................................................................10 3.2 Data Release Production....................................................................................................13 3.3 User Query Access.............................................................................................................13 3.3.1 Distributed and parallel.............................................................................................14 3.3.2 Shared-nothing..........................................................................................................14 3.3.3 Indexing....................................................................................................................15 3.3.4 Shared scanning........................................................................................................15 3.3.5 Clustering..................................................................................................................16 3.3.6 Partitioning................................................................................................................17 3.3.7 Technology choice....................................................................................................19 4. Requirements.............................................................................................................................20 4.1 General Requirements........................................................................................................20 4.2 Data Production Related Requirements.............................................................................21 4.3 Query Access Related Requirements.................................................................................21 4.4 Discussion..........................................................................................................................23 4.4.1 Implications...............................................................................................................23 4.4.2 Query complexity and access patterns......................................................................24 5. Potential Solutions - Research...................................................................................................25 5.1 The Research......................................................................................................................25 5.2 The Results.........................................................................................................................25 5.3 Map/Reduce-based and NoSQL Solutions........................................................................26 5.4 DBMS Solutions................................................................................................................27 5.4.1 Parallel DBMSes.......................................................................................................27 5.4.2 Object-oriented solutions..........................................................................................30 5.4.3 Row-based vs columnar stores..................................................................................30 5.4.4 Appliances.................................................................................................................32 5.5 Comparison and Discussion ..............................................................................................32 6. Design Trade-offs......................................................................................................................36 6.1 Standalone Tests................................................................................................................37 6.1.1 Spatial join performance...........................................................................................37 3 LSST Database Design LDM-135 08/02/13 6.1.2 Building sub-partitions..............................................................................................37 6.1.3 Sub-partition overhead..............................................................................................38 6.1.4 Avoiding materializing sub-partitions......................................................................38 6.1.5 Billion row table / reference catalog.........................................................................38 6.1.6 Compression.............................................................................................................39 6.1.7 Full table scan performance......................................................................................39 6.1.8 Low-volume queries.................................................................................................39 6.1.9 Solid state disks.........................................................................................................40 6.2 Data Challenge Related Tests............................................................................................41 6.2.1 DC1: data ingest........................................................................................................41 6.2.2 DC2: source/object association.................................................................................41 6.2.3 DC3: catalog construction.........................................................................................41 6.2.4 Winter-2013 Data Challenge: querying database for forced photometry.................42 6.2.5 Winter-2013 Data Challenge: partitioning 2.6 TB table for Qserv..........................42 6.2.6 Winter-2013 Data Challenge: multi-billion-row table..............................................42 7. Risk Analysis.............................................................................................................................43 7.1 Potential Key Risks............................................................................................................43 7.2 Risks Mitigations...............................................................................................................45 8. Implementation of the Query Service (Qserv) Prototype..........................................................46 8.1 Components.......................................................................................................................46 8.1.1 MySQL.....................................................................................................................46 8.1.2 XRootD.....................................................................................................................46 8.2 Partitioning.........................................................................................................................47 8.3 Query Generation...............................................................................................................48 8.3.1 Processing modules...................................................................................................48 8.3.2 Processing module overview....................................................................................49 8.4 Dispatch.............................................................................................................................50 8.4.1 Wire protocol............................................................................................................50 8.4.2 Frontend....................................................................................................................50 8.4.3 Worker......................................................................................................................51 8.5 Threading Model................................................................................................................51 8.6 Aggregation........................................................................................................................52 8.7 Indexing.............................................................................................................................53 8.8 Data Distribution................................................................................................................53 8.8.1 Database data distribution.........................................................................................53
Recommended publications
  • Mysql Workbench Mysql Workbench
    MySQL Workbench MySQL Workbench Abstract This manual documents the MySQL Workbench SE version 5.2 and the MySQL Workbench OSS version 5.2. If you have not yet installed MySQL Workbench OSS please download your free copy from the download site. MySQL Workbench OSS is available for Windows, Mac OS X, and Linux. Document generated on: 2012-05-01 (revision: 30311) For legal information, see the Legal Notice. Table of Contents Preface and Legal Notice ................................................................................................................. vii 1. MySQL Workbench Introduction ..................................................................................................... 1 2. MySQL Workbench Editions ........................................................................................................... 3 3. Installing and Launching MySQL Workbench ................................................................................... 5 Hardware Requirements ............................................................................................................. 5 Software Requirements .............................................................................................................. 5 Starting MySQL Workbench ....................................................................................................... 6 Installing MySQL Workbench on Windows .......................................................................... 7 Launching MySQL Workbench on Windows .......................................................................
    [Show full text]
  • Beyond Relational Databases
    EXPERT ANALYSIS BY MARCOS ALBE, SUPPORT ENGINEER, PERCONA Beyond Relational Databases: A Focus on Redis, MongoDB, and ClickHouse Many of us use and love relational databases… until we try and use them for purposes which aren’t their strong point. Queues, caches, catalogs, unstructured data, counters, and many other use cases, can be solved with relational databases, but are better served by alternative options. In this expert analysis, we examine the goals, pros and cons, and the good and bad use cases of the most popular alternatives on the market, and look into some modern open source implementations. Beyond Relational Databases Developers frequently choose the backend store for the applications they produce. Amidst dozens of options, buzzwords, industry preferences, and vendor offers, it’s not always easy to make the right choice… Even with a map! !# O# d# "# a# `# @R*7-# @94FA6)6 =F(*I-76#A4+)74/*2(:# ( JA$:+49>)# &-)6+16F-# (M#@E61>-#W6e6# &6EH#;)7-6<+# &6EH# J(7)(:X(78+# !"#$%&'( S-76I6)6#'4+)-:-7# A((E-N# ##@E61>-#;E678# ;)762(# .01.%2%+'.('.$%,3( @E61>-#;(F7# D((9F-#=F(*I## =(:c*-:)U@E61>-#W6e6# @F2+16F-# G*/(F-# @Q;# $%&## @R*7-## A6)6S(77-:)U@E61>-#@E-N# K4E-F4:-A%# A6)6E7(1# %49$:+49>)+# @E61>-#'*1-:-# @E61>-#;6<R6# L&H# A6)6#'68-# $%&#@:6F521+#M(7#@E61>-#;E678# .761F-#;)7-6<#LNEF(7-7# S-76I6)6#=F(*I# A6)6/7418+# @ !"#$%&'( ;H=JO# ;(\X67-#@D# M(7#J6I((E# .761F-#%49#A6)6#=F(*I# @ )*&+',"-.%/( S$%=.#;)7-6<%6+-# =F(*I-76# LF6+21+-671># ;G';)7-6<# LF6+21#[(*:I# @E61>-#;"# @E61>-#;)(7<# H618+E61-# *&'+,"#$%&'$#( .761F-#%49#A6)6#@EEF46:1-#
    [Show full text]
  • Data Platforms Map from 451 Research
    1 2 3 4 5 6 Azure AgilData Cloudera Distribu2on HDInsight Metascale of Apache Kaa MapR Streams MapR Hortonworks Towards Teradata Listener Doopex Apache Spark Strao enterprise search Apache Solr Google Cloud Confluent/Apache Kaa Al2scale Qubole AWS IBM Azure DataTorrent/Apache Apex PipelineDB Dataproc BigInsights Apache Lucene Apache Samza EMR Data Lake IBM Analy2cs for Apache Spark Oracle Stream Explorer Teradata Cloud Databricks A Towards SRCH2 So\ware AG for Hadoop Oracle Big Data Cloud A E-discovery TIBCO StreamBase Cloudera Elas2csearch SQLStream Data Elas2c Found Apache S4 Apache Storm Rackspace Non-relaonal Oracle Big Data Appliance ObjectRocket for IBM InfoSphere Streams xPlenty Apache Hadoop HP IDOL Elas2csearch Google Azure Stream Analy2cs Data Ar2sans Apache Flink Azure Cloud EsgnDB/ zone Platforms Oracle Dataflow Endeca Server Search AWS Apache Apache IBM Ac2an Treasure Avio Kinesis LeanXcale Trafodion Splice Machine MammothDB Drill Presto Big SQL Vortex Data SciDB HPCC AsterixDB IBM InfoSphere Towards LucidWorks Starcounter SQLite Apache Teradata Map Data Explorer Firebird Apache Apache JethroData Pivotal HD/ Apache Cazena CitusDB SIEM Big Data Tajo Hive Impala Apache HAWQ Kudu Aster Loggly Ac2an Ingres Sumo Cloudera SAP Sybase ASE IBM PureData January 2016 Logic Search for Analy2cs/dashDB Logentries SAP Sybase SQL Anywhere Key: B TIBCO Splunk Maana Rela%onal zone B LogLogic EnterpriseDB SQream General purpose Postgres-XL Microso\ Ry\ X15 So\ware Oracle IBM SAP SQL Server Oracle Teradata Specialist analy2c PostgreSQL Exadata
    [Show full text]
  • Betrfs: a Right-Optimized Write-Optimized File System
    BetrFS: A Right-Optimized Write-Optimized File System William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, Stony Brook University; John Esmet, Tokutek Inc.; Yizheng Jiao, Ankur Mittal, Prashant Pandey, and Phaneendra Reddy, Stony Brook University; Leif Walsh, Tokutek Inc.; Michael Bender, Stony Brook University; Martin Farach-Colton, Rutgers University; Rob Johnson, Stony Brook University; Bradley C. Kuszmaul, Massachusetts Institute of Technology; Donald E. Porter, Stony Brook University https://www.usenix.org/conference/fast15/technical-sessions/presentation/jannen This paper is included in the Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST ’15). February 16–19, 2015 • Santa Clara, CA, USA ISBN 978-1-931971-201 Open access to the Proceedings of the 13th USENIX Conference on File and Storage Technologies is sponsored by USENIX BetrFS: A Right-Optimized Write-Optimized File System William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet∗, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh∗, Michael Bender, Martin Farach-Colton†, Rob Johnson, Bradley C. Kuszmaul‡, and Donald E. Porter Stony Brook University, ∗Tokutek Inc., †Rutgers University, and ‡Massachusetts Institute of Technology Abstract (microwrites). Examples include email delivery, creat- The Bε -tree File System, or BetrFS, (pronounced ing lock files for an editing application, making small “better eff ess”) is the first in-kernel file system to use a updates to a large file, or updating a file’s atime. The un- write-optimized index. Write optimized indexes (WOIs) derlying problem is that many standard data structures in are promising building blocks for storage systems be- the file-system designer’s toolbox optimize for one case cause of their potential to implement both microwrites at the expense of another.
    [Show full text]
  • XAMPP Web Development Stack
    XAMPP Web Development Stack Overview @author R.L. Martinez, Ph.D. The steps below outline the processes for installing the XAMPP stack on a local machine. The XAMPP (pronounced Zamp) stack includes the following: Apache HTTP Server, MariaDB (essentially MySQL), Database Server, Perl, and the PHP Interpreter. The “X” in XAMPP is used to signify the cross-platform compatibility of the stack. The Apache HTTP Server and PHP are required to run phpMyAdmin which is a PHP application that is used for database administration tasks such as creating databases and tables, adding users, etc. Alternative to XAMPP If you have experience with MySQL Workbench, you may prefer to install MySQL Server and MySQL Workbench via the MySQL Installer. MySQL Workbench performs the same functions as phpMyAdmin. However, unlike phpMyAdmin which is a web-based application, MySQL Workbench is a locally installed application and therefore does not require an HTTP Server (e.g. Apache) to run. Installing XAMPP Many of the steps listed have several alternatives (such as changing MySQL passwords via a command line) and students are welcomed and encouraged to explore alternatives. 1. Download XAMPP from the URL below and place the installer (.exe) in the location where you want to install XAMPP. Placing the installer (.exe) in the same location as the intended installation is not required but preferred. http://www.apachefriends.org/download.html Page 1 of 17 XAMPP Web Development Stack 2. See the warning which recommends not installing to C:\Program Files (x86) which can be restricted by UAC (User Account Control). In the steps below XAMPP is installed to a USB flash drive for portability.
    [Show full text]
  • Mariadb Presentation
    THE VALUE OF OPEN SOURCE MICHAEL ”MONTY” WIDENIUS Entrepreneur, MariaDB Hacker, MariaDB CTO MariaDB Corporation AB 2019-09-25 Seoul 11 Reasons Open Source is Better than Closed Source ● Using open standards (no lock in into proprietary standards) ● Resource friendly; OSS software tend to work on old hardware ● Lower cost; Usually 1/10 of closed source software ● No cost for testing the full software ● Better documentation and more troubleshooting resources ● Better support, in many cases directly from the developers ● Better security, auditability (no trap doors and more eye balls) ● Better quality; Developed together with users ● Better customizability; You can also participate in development ● No vendor lock in; More than one vendor can give support ● When using open source, you take charge of your own future Note that using open source does not mean that you have to become a software producer! OPEN SOURCE, THE GOOD AND THE BAD ● Open source is a better way to develop software ● More developers ● More spread ● Better code (in many cases) ● Works good for projects that can freely used by a lot of companies in their production or products. ● It's very hard to create a profitable company developing an open source project. ● Not enough money to pay developers. ● Hard to get money and investors for most projects (except for infrastructure projects like libraries or daemon services). OPEN SOURCE IS NATURAL OR WHY OPEN SOURCE WORKS ● You use open source because it's less expensive (and re-usable) ● You solve your own problems and get free help and development efforts from others while doing it.
    [Show full text]
  • GIS Features in Mariadb and Mysql What Has Happened in Recent Years?
    GIS features in MariaDB and MySQL What has happened in recent years? Hartmut Holzgraefe Principal Support Engineer at MariaDB Inc. [email protected] August 20, 2016 Hartmut Holzgraefe (MariaDB Inc.) GIS features in MariaDB and MySQL August 20, 2016 1 / 35 Overview 1 GIS Introduction 2 MySQL GIS History 3 Other Open Source GIS Databases 4 Performance 5 The End ... Hartmut Holzgraefe (MariaDB Inc.) GIS features in MariaDB and MySQL August 20, 2016 2 / 35 GIS Introduction 1 GIS Introduction Examples 2 MySQL GIS History 3 Other Open Source GIS Databases 4 Performance 5 The End ... Hartmut Holzgraefe (MariaDB Inc.) GIS features in MariaDB and MySQL August 20, 2016 3 / 35 GIS Data Types Geospatial Information System (GIS) data types describe geometries in a (usually) two-dimensional space. There are several different geometric subtypes: Simple types: POINT, LINESTRING, POLYGON, GEOMETRY Collection types: MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION Hartmut Holzgraefe (MariaDB Inc.) GIS features in MariaDB and MySQL August 20, 2016 4 / 35 Spatial Properties Spatial properties of a geometry can be: Coordinates Length Area Is Closed Bounding Rectangle ... Hartmut Holzgraefe (MariaDB Inc.) GIS features in MariaDB and MySQL August 20, 2016 5 / 35 Spatial Relationships The most important spatial relationships between two geometries: Hartmut Holzgraefe (MariaDB Inc.) GIS features in MariaDB and MySQL August 20, 2016 6 / 35 Examples 1 GIS Introduction Examples 2 MySQL GIS History 3 Other Open Source GIS Databases 4 Performance 5 The
    [Show full text]
  • Navicat Wine En.Pdf
    Table of Contents Getting Started 8 System Requirements 9 Registration 9 Installation 10 Maintenance/Upgrade 11 End-User License Agreement 11 Connection 17 Navicat Cloud 18 General Settings 21 Advanced Settings 24 SSL Settings 27 SSH Settings 28 HTTP Settings 29 Server Objects 31 MySQL/MariaDB Objects 31 MySQL Tables 31 MySQL/MariaDB Table Fields 32 MySQL/MariaDB Table Indexes 34 MySQL/MariaDB Table Foreign Keys 35 MySQL/MariaDB Table Triggers 36 MySQL/MariaDB Table Options 37 MySQL/MariaDB Views 40 MySQL/MariaDB Functions/Procedures 41 MySQL/MariaDB Events 43 Oracle Objects 44 Oracle Data Pump (Available only in Full Version) 44 Oracle Data Pump Export 45 Oracle Data Pump Import 48 Oracle Debugger (Available only in Full Version) 52 Oracle Physical Attributes/Default Storage Characteristics 53 Oracle Tables 55 Oracle Normal Tables 55 Oracle Table Fields 55 Oracle Table Indexes 57 Oracle Table Foreign Keys 58 Oracle Table Uniques 59 Oracle Table Checks 59 Oracle Table Triggers 60 Oracle Table Options 61 Oracle External Tables 62 2 Fields for Oracle External Tables 62 External Properties for Oracle External Tables 63 Access Parameters for Oracle External Tables 64 Oracle Index Organized Tables 64 Options for Oracle Index Organized Tables 64 Oracle Views 65 Oracle Functions/Procedures 66 Oracle Database Links 68 Oracle Indexes 68 Oracle Java 71 Oracle Materialized Views 72 Oracle Materialized View Logs 75 Oracle Packages 76 Oracle Sequences 77 Oracle Synonyms 78 Oracle Triggers 78 Oracle Types 81 Oracle XML Schemas 82 Oracle Recycle Bin
    [Show full text]
  • Mariadb Subscription Services Agreement
    MARIADB CORPORATION MariaDB Subscription Services MariaDB Subscription customers have access to technical support services including Problem Resolution Support, Engineering Support, Consultative Support, Remote Login Support, and Telephone Support for the MariaDB platform (MariaDB server, MariaDB TX for transactions, MariaDB AX for analytics, MariaDB MaxScale, and related products like storage engines) via the Customer Support Portal. Each designated technical contact will receive a Customer Support Portal login (based on the associated email address) that can be used to report new support issues, monitor ongoing issues, or review historical issues. Information regarding making changes to technical contacts can be found in the "Welcome Letter" provided after signup, and is also available in the “Contact Us” section of the Customer Support Portal. If you have issues initially logging into the Customer Support Portal, you will be prompted to email [email protected] for further assistance. If Remote DBA services are purchased, an on-boarding call is scheduled to gather the necessary information for the MariaDB Remote DBA team to remotely access supported products. Information about the architecture, operating systems, database server versions, backup schedules, etc will also be documented during this call. Once the required information has been collected, monitoring software will be installed and setup to alert MariaDB Corporation. Certain alerts such as server availability, replication health, and others will be configured to open issues automatically in the Customer Support Portal. All services are delivered in English. MariaDB Corporation will use reasonable efforts to provide technical support in languages other than English using MariaDB Corporation’s available personnel via voice calls and in-person meetings, but may not have such resources available at all or at the time of the support request.
    [Show full text]
  • Algorithms and Complexity (AL)
    Algorithms and Complexity (AL) Algorithms are fundamental to computer science and software engineering. The real-world performance of any software system depends on the algorithms chosen and the suitability of the various layers of implementation. Good algorithm design is therefore crucial for the performance of all software systems. Moreover, the study of algorithms provides insight into the intrinsic nature of the problem as well as possible solution techniques independent of programming language, programming paradigm, computer hardware, or any other implementation aspect. An important part of computing is the ability to select algorithms appropriate to particular purposes and to apply them, recognizing the possibility that no suitable algorithm may exist. This facility relies on understanding the range of algorithms that address an important set of well-defined problems, recognizing their strengths and weaknesses, and their suitability in particular contexts. Efficiency is a pervasive theme throughout this area. This knowledge area defines the central concepts and skills required to design, implement, and analyze algorithms for solving problems. Algorithms are essential in all advanced areas of computer science: artificial intelligence, databases, distributed computing, graphics, networking, operating systems, programming languages, security, and so on. Algorithms that have specific utility in each of these are listed in the relevant knowledge areas. Cryptography, for example, appears in the new Knowledge Area on Information Assurance and Security (IAS), while parallel and distributed algorithms appear in the Knowledge Area in Parallel and Distributed Computing (PD). As with all knowledge areas, the order of topics and their groupings do not necessarily correlate to a specific order of presentation. Different programs will teach the topics in different courses and should do so in the order they believe is most appropriate for their students.
    [Show full text]
  • Hidden Gears of Your Application
    Hidden gears of your application Sergej Kurakin Problem ● Need for quick response ● Need for many updates ● Need for different jobs done ● Need for task to be done as different user on server side ● Near real-time job start ● Load distribution Job Queue ● You put job to queue ● Worker takes the job and makes it done Job Queue using Crons ● Many different implementations ● Perfect for small scale ● Available on many systems/servers ● Crons are limited to running once per minute ● Harder to distribute load Gearman Job Server ● Job Queue ● http://gearman.org/ ● C/C++ ● Multi-language ● Scalable and Fault Tolerant ● Huge message size (up to 4 gig) Gearman Stack Gearman Job Types Normal Job Background Job ● Run Job ● Run Job in ● Return Result Background ● No Return of Result Gearman Parallel Tasks Gearman Supported Languages ● C ● Java ● Perl ● C#/.NET ● NodeJS ● Ruby ● PHP ● Go ● Python ● Lisp Job Priority ● Low ● Normal ● High Gearman Worker Example <?php // Reverse Worker Code $worker = new GearmanWorker(); $worker->addServer(); $worker->addFunction("reverse", function ($job) { return strrev($job->workload()); }); while ($worker->work()); Gearman Client Example <?php // Reverse Client Code $client = new GearmanClient(); $client->addServer(); print $client->do("reverse", "Hello World!"); Gearman Client Example <?php // Reverse Client Code $client = new GearmanClient(); $client->addServer(); $client->doBackground("reverse", "Hello World!"); Running Worker in Background ● CLI ● screen / tmux ● supervisord - http://supervisord.org/ ● daemontools
    [Show full text]
  • Mysql Workbench Release Notes
    MySQL Workbench Release Notes Abstract This document contains release notes for the changes in each release of MySQL Workbench. For additional MySQL Workbench documentation, see MySQL Workbench. MySQL Workbench platform support evolves over time. For the latest platform support information, see https:// www.mysql.com/support/supportedplatforms/workbench.html. Updates to these notes occur as new product features are added, so that everybody can follow the development process. If a recent version is listed here that you cannot find on the download page (https://dev.mysql.com/ downloads/), the version has not yet been released. The documentation included in source and binary distributions may not be fully up to date with respect to release note entries because integration of the documentation occurs at release build time. For the most up-to-date release notes, please refer to the online documentation instead. For legal information, see the Legal Notices. For help with using MySQL, please visit the MySQL Forums, where you can discuss your issues with other MySQL users. Document generated on: 2021-09-23 (revision: 23350) Table of Contents Preface and Legal Notices ................................................................................................................. 4 Changes in MySQL Workbench 8.0 .................................................................................................... 5 Changes in MySQL Workbench 8.0.27 (Not yet released, General Availability) .............................. 5 Changes in MySQL Workbench 8.0.26 (2021-07-20, General Availability) ..................................... 5 Changes in MySQL Workbench 8.0.25 (2021-05-11, General Availability) ..................................... 5 Changes in MySQL Workbench 8.0.24 (2021-04-20, General Availability) ..................................... 5 Changes in MySQL Workbench 8.0.23 (2021-01-18, General Availability) ..................................... 7 Changes in MySQL Workbench 8.0.22 (2020-10-19, General Availability) ....................................
    [Show full text]