Science Data Vaults in Monetdb: a Case Study

Total Page:16

File Type:pdf, Size:1020Kb

Science Data Vaults in Monetdb: a Case Study Universidade do Minho Escola de Engenharia Departamento de Informática Master’s Thesis Master in Informatics Engineering Science Data Vaults in MonetDB: A Case Study João Nuno Araújo Sá Supervisors: Prof. Dr. José Orlando Pereira Departamento de Informatica, Universidade do Minho Prof. Dr. Martin Kersten Centrum Wiskunde & Informatica, Amsterdam July 2011 ii Declaration Name: João Nuno Araújo Sá Email: [email protected] Telephone: +351964508853 ID Card: 13171868 Thesis Title: Science Data Vaults in MonetDB: A case study Supervisors: Prof. Dr. José Orlando Pereira Prof. Dr. Martin Kersten Dra. Milena Ivanova Year of Completion: 2011 Designation of Master: Master in Informatics Engineering É AUTORIZADA A REPRODUÇÃO INTEGRAL DESTA TESE APENAS PARA EFEITOS DE INVESTIGAÇÃO, MEDIANTE DECLARAÇÃO ESCRITA DO INTERESSADO, QUE A TAL SE COMPROMETE. University of Minho, 13th July 2011 João Nuno Araújo Sá ii Experience is what you get when you didn’t get what you wanted. Randy Pausch (The Last Lecture) iv Acknowledgments To Dr. José Orlando Pereira for accepting being my supervisor and for giving me this possibility to do my master thesis in Amsterdam. Apart from the distance, all the emails and recommendations were very helpful and I am thankful. To Dr. Martin Kersten for receiving me in such a recognizable place as CWI and providing me the opportunity to be responsible for this project. To Bart Scheers for all the patience talking about astronomical concepts and suggest- ing ideas to build a robust and solid use case. A special thanks to Milena Ivanova for all the support, advice, inspiration and friend- ship. During all the meetings her help was essential, and to all the intensive corrections on the text I am extremely grateful. To all the people of CWI, in particular the INS-1 group, for making me feel one of them through their sympathy and professionalism. Ao meu pai, José Manuel Araújo Fernandes Sá e à minha mãe, Fernanda da Conceição Pereira de Araújo Sá, por todo o apoio, compreensão e saudades durante este ano que estive fora. A toda a minha família, aos meus velhos amigos de Viana do Castelo e aos amigos que fiz em Braga. To all my Amsterdam friends for making this year memorable. I will never forget the moments we had together. v vi Resumo Hoje em dia, a quantidade de dados gerada por instrumentos científicos (dados cap- turados) e por simulações de computador (dados gerados) é muito grande. A quantidade de dados está a tornar-se cada vez maior, quer por melhorias na precisão dos novos intru- mentos, quer pelo aumento do número de estações que recolhem os dados. Isto requere novos métodos científicos que permitam analisar e organizar os dados. No entanto, não é fácil lidar com estes dados, e com todos os passos pelos quais ne- cessitam de passar (capturar, organizar, analisar, visualizar e publicar). Muitos são colec- cionados (captura), mas não são selecionados (organização, análise) ou publicados. Nesta tese focamo-nos nos dados astronómicos, que são geralmente armazenados em ficheiros FITS (Flexible Image Transport System). Vamos investigar o acesso a esses da- dos, e pesquisar informação neles contida, utilizando para isso uma tecnologia de base de dados. A base de dados alvo é o MonetDB, uma base de dados de armazenamento por colunas, de código livre, que já demonstrou ter sucesso em aplicações que analisam a carga de trabalho e aplicações científicas (SkyServer). Perante os resultados obtidos durante as experiências, a perceptível superioridade apresentada pelo MonetDB em relação à ferramenta STILTS quando mais computação é exigida, e por último, pelo sucesso na execução do conjunto de testes apresentado pelo astronómo que trabalha no CWI, podemos afirmar que o MonetDB é uma alternativa forte e robusta para manipular e aceder informação contida em ficheiros FITS. vii viii Abstract Nowadays, the amount of data generated by scientific instruments (data captured) and computer simulations (data generated) is very large. The data volumes are getting bigger, due to the improved precision of the new instruments, or due to the increasing number of collecting stations. This requires new scientific methods to analyse and organize the data. However, it is not so easy to deal with this data, and with all the steps that the data have to get through (capture, organize, analyze, visualize, and publish). A lot of data is collected (captured), but not curated (organized, analyzed) or published. In this thesis we focus on the astronomical data, typically they are stored in FITS files (Flexible Image Transport System). We will investigate the access and querying of this data by means of database technology. The target database system is MonetDB, an open-source column-store database with record of successful application to analytical workloads and scientific applications (SkyServer). Given the results of the experiments, the perceptible superiority presented by Mon- etDB over STILTS when more computation is required, and the success obtained during the execution of the use case proposed by an astronomer working at the CWI, we can declare that MonetDB is a powerfull and robust alternative to manipulate and access information contained in FITS files. ix x Contents 1 Introduction1 1.1 The need to integrate with repositories.....................2 1.2 Assumptions....................................3 1.3 Contributions...................................3 1.4 Approach......................................4 1.5 Project Objectives.................................4 1.6 Outline of report.................................4 2 Background7 2.1 Introduction to MonetDB............................7 2.2 Introduction to FITS...............................8 2.2.1 Applications of the FITS.........................9 2.2.2 The structure of a FITS file........................9 3 Contribution to MonetDB 13 3.1 Overview of the vaults.............................. 13 3.2 Architecture of the vault............................. 14 3.3 Attach a file.................................... 15 3.4 Attach all FITS files in the directory....................... 16 3.5 Attach all FITS files in the directory, using a pattern............. 16 3.6 Table loading................................... 17 3.6.1 Search for the ideal batch size...................... 21 3.6.2 BAT size representation of Strings in MonetDB............ 31 3.7 Export a table................................... 33 xi xii CONTENTS 4 Case Study 37 4.1 Overview...................................... 37 4.2 Attach a file.................................... 37 4.3 Attach all FITS files in the directory....................... 38 4.4 Attach all FITS files in the directory, giving a pattern............. 38 4.5 Load a table.................................... 38 4.6 Export a table................................... 39 4.7 Cross-matching astronomical surveys..................... 39 4.7.1 Query 1: Distribution of distances between sources in both surveys 41 4.7.2 Distribution of the distances smaller than 45 arc seconds...... 44 4.7.3 Normal Distribution of all the data................... 46 4.7.4 Frequency of the distances smaller than 5 arc seconds........ 47 4.7.5 Frequency of the r value between sources in both surveys..... 48 4.7.6 Query 2: extract & compare brightness in different frequencies.. 49 4.7.7 Query 3: extract the spectral index................... 51 4.7.8 Distribution of the spectal index.................... 51 4.7.9 Normal distribution of the spectral indexes.............. 52 5 Performance Experiments 55 5.1 Experimental Setting............................... 55 5.2 Test Files...................................... 56 5.3 Delegation experiments............................. 58 5.3.1 Selection and Filter delegation for Group number 1......... 58 5.3.2 Range Delegation for Group number 1................ 65 5.3.3 Statistics Delegation for Group number 1............... 69 5.3.4 Selection and Filter Delegation for Group number 2......... 71 5.3.5 Range delegation for Group number 2................. 73 5.3.6 MonetDB problem............................ 77 5.3.7 Projection delegation for Group number 2............... 80 5.3.8 Statistical Delegation for Group number 2............... 80 5.3.9 Summary of the tests for the first and second groups........ 82 CONTENTS xiii 5.3.10 Equi-join delegation for Group number 3............... 83 5.3.11 Band-join delegation for Group number 3............... 88 6 Related Work 93 6.1 CFITSIO...................................... 93 6.1.1 Fv...................................... 96 6.2 STIL......................................... 97 6.2.1 TOPCAT.................................. 102 6.2.2 STILTS................................... 106 6.3 Comparison between tools............................ 107 6.4 Astronomical data formats............................ 107 6.4.1 HDF5 Array Database.......................... 107 6.4.2 VOTable.................................. 109 6.4.3 Comparison between file formats.................... 111 7 Conclusion 113 7.1 Results and Overview.............................. 113 7.2 Future Work.................................... 114 Bibliography 116 xiv CONTENTS List of Figures 3.1 Three layers of a vault.............................. 14 3.2 Load of all the numerical types......................... 19 3.3 BAT and File sizes for each one of the numerical types............ 20 3.4 Batch 1 for the strings with 4 bytes....................... 22 3.5 Batch 1 for the strings with 8 bytes....................... 22 3.6 Batch 1 for the strings with 20 bytes...................... 22 3.7 Batch 10
Recommended publications
  • Monetdb/X100: Hyper-Pipelining Query Execution
    MonetDB/X100: Hyper-Pipelining Query Execution Peter Boncz, Marcin Zukowski, Niels Nes CWI Kruislaan 413 Amsterdam, The Netherlands fP.Boncz,M.Zukowski,[email protected] Abstract One would expect that query-intensive database workloads such as decision support, OLAP, data- Database systems tend to achieve only mining, but also multimedia retrieval, all of which re- low IPC (instructions-per-cycle) efficiency on quire many independent calculations, should provide modern CPUs in compute-intensive applica- modern CPUs the opportunity to get near optimal IPC tion areas like decision support, OLAP and (instructions-per-cycle) efficiencies. multimedia retrieval. This paper starts with However, research has shown that database systems an in-depth investigation to the reason why tend to achieve low IPC efficiency on modern CPUs in this happens, focusing on the TPC-H bench- these application areas [6, 3]. We question whether mark. Our analysis of various relational sys- it should really be that way. Going beyond the (im- tems and MonetDB leads us to a new set of portant) topic of cache-conscious query processing, we guidelines for designing a query processor. investigate in detail how relational database systems The second part of the paper describes the interact with modern super-scalar CPUs in query- architecture of our new X100 query engine intensive workloads, in particular the TPC-H decision for the MonetDB system that follows these support benchmark. guidelines. On the surface, it resembles a The main conclusion we draw from this investiga- classical Volcano-style engine, but the cru- tion is that the architecture employed by most DBMSs cial difference to base all execution on the inhibits compilers from using their most performance- concept of vector processing makes it highly critical optimization techniques, resulting in low CPU CPU efficient.
    [Show full text]
  • EMC Greenplum Data Computing Appliance: High Performance for Data Warehousing and Business Intelligence an Architectural Overview
    EMC Greenplum Data Computing Appliance: High Performance for Data Warehousing and Business Intelligence An Architectural Overview EMC Information Infrastructure Solutions Abstract This white paper provides readers with an overall understanding of the EMC® Greenplum® Data Computing Appliance (DCA) architecture and its performance levels. It also describes how the DCA provides a scalable end-to- end data warehouse solution packaged within a manageable, self-contained data warehouse appliance that can be easily integrated into an existing data center. October 2010 Copyright © 2010 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com All other trademarks used herein are the property of their respective owners. Part number: H8039 EMC Greenplum Data Computing Appliance: High Performance for Data Warehousing and Business Intelligence—An Architectural Overview 2 Table of Contents Executive summary .................................................................................................
    [Show full text]
  • Microsoft's SQL Server Parallel Data Warehouse Provides
    Microsoft’s SQL Server Parallel Data Warehouse Provides High Performance and Great Value Published by: Value Prism Consulting Sponsored by: Microsoft Corporation Publish date: March 2013 Abstract: Data Warehouse appliances may be difficult to compare and contrast, given that the different preconfigured solutions come with a variety of available storage and other specifications. Value Prism Consulting, a management consulting firm, was engaged by Microsoft® Corporation to review several data warehouse solutions, to compare and contrast several offerings from a variety of vendors. The firm compared each appliance based on publicly-available price and specification data, and on a price-to-performance scale, Microsoft’s SQL Server 2012 Parallel Data Warehouse is the most cost-effective appliance. Disclaimer Every organization has unique considerations for economic analysis, and significant business investments should undergo a rigorous economic justification to comprehensively identify the full business impact of those investments. This analysis report is for informational purposes only. VALUE PRISM CONSULTING MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. ©2013 Value Prism Consulting, LLC. All rights reserved. Product names, logos, brands, and other trademarks featured or referred to within this report are the property of their respective trademark holders in the United States and/or other countries. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this report may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
    [Show full text]
  • Hannes Mühleisen
    Large-Scale Statistics with MonetDB and R Hannes Mühleisen DAMDID 2015, 2015-10-13 About Me • Postdoc at CWI Database architectures since 2012 • Amsterdam is nice. We have open positions. • Special interest in data management for statistical analysis • Various research & software projects in this space 2 Outline • Column Store / MonetDB Introduction • Connecting R and MonetDB • Advanced Topics • “R as a Query Language” • “Capturing The Laws of Data Nature” 3 Column Stores / MonetDB Introduction 4 Postgres, Oracle, DB2, etc.: Conceptional class speed flux NX 1 3 Constitution 1 8 Galaxy 1 3 Defiant 1 6 Intrepid 1 1 Physical (on Disk) NX 1 3 Constitution 1 8 Galaxy 1 3 Defiant 1 6 Intrepid 1 1 5 Column Store: class speed flux Compression! NX 1 3 Constitution 1 8 Galaxy 1 3 Defiant 1 6 Intrepid 1 1 NX Constitution Galaxy Defiant Intrepid 1 1 1 1 1 3 8 3 6 1 6 What is MonetDB? • Strict columnar architecture OLAP RDBMS (SQL) • Started by Martin Kersten and Peter Boncz ~1994 • Free & Open Open source, active development ongoing • www.monetdb.org Peter A. Boncz, Martin L. Kersten, and Stefan Manegold. 2008. Breaking the memory wall in MonetDB. Communications of the ACM 51, 12 (December 2008), 77-85. DOI=10.1145/1409360.1409380 7 MonetDB today • Expanded C code • MAL “DB assembly” & optimisers • SQL to MAL compiler • Memory-Mapped files • Automatic indexing 8 Some MAL • Optimisers run on MAL code • Efficient Column-at-a-time implementations EXPLAIN SELECT * FROM mtcars; | X_2 := sql.mvc(); | | X_3:bat[:oid,:oid] := sql.tid(X_2,"sys","mtcars"); | | X_6:bat[:oid,:dbl]
    [Show full text]
  • LDBC Introduction and Status Update
    8th Technical User Community meeting - LDBC Peter Boncz [email protected] ldbcouncil.org LDBC Organization (non-profit) “sponsors” + non-profit members (FORTH, STI2) & personal members + Task Forces, volunteers developing benchmarks + TUC: Technical User Community (8 workshops, ~40 graph and RDF user case studies, 18 vendor presentations) What does a benchmark consist of? • Four main elements: – data & schema: defines the structure of the data – workloads: defines the set of operations to perform – performance metrics: used to measure (quantitatively) the performance of the systems – execution & auditing rules: defined to assure that the results from different executions of the benchmark are valid and comparable • Software as Open Source (GitHub) – data generator, query drivers, validation tools, ... Audience • For developers facing graph processing tasks – recognizable scenario to compare merits of different products and technologies • For vendors of graph database technology – checklist of features and performance characteristics • For researchers, both industrial and academic – challenges in multiple choke-point areas such as graph query optimization and (distributed) graph analysis SPB scope • The scenario involves a media/ publisher organization that maintains semantic metadata about its Journalistic assets (articles, photos, videos, papers, books, etc), also called Creative Works • The Semantic Publishing Benchmark simulates: – Consumption of RDF metadata (Creative Works) – Updates of RDF metadata, related to Annotations • Aims to be
    [Show full text]
  • Procurement and Spend Fact and Dimension Modeling
    Procurement and Spend Dimension and Fact Job Aid Contents Introduction .................................................................................................................................................. 2 Procurement and Spend – Purchase Orders Subject Area: ........................................................................ 10 Procurement and Spend – Purchase Requisitions Subject Area:................................................................ 14 Procurement and Spend – Purchase Receipts Subject Area:...................................................................... 17 1 Procurement and Spend Dimension and Fact Job Aid Introduction The purpose of this job aid is to provide an explanation of dimensional data modeling and of using dimensions and facts to build analyses within the Procurement and Spend Subject Areas. Dimensional Data Model The dimensional model is comprised of a fact table and many dimensional tables and used for calculating summarized data. Since Business Intelligence reports used in measuring the facts (aggregates) across various dimensions, dimensional data modeling is the preferred modeling technique in a BI environment. STARS - OBI data model based on Dimensional Modeling. The underlying database tables separated as Fact Tables and Dimension Tables. The dimension tables joined to fact tables with specific keys. This data model usually called Star Schema. The star schema separates business process data into facts, which hold the measurable, quantitative data about the business, and dimensions
    [Show full text]
  • The Database Architectures Research Group at CWI
    The Database Architectures Research Group at CWI Martin Kersten Stefan Manegold Sjoerd Mullender CWI Amsterdam, The Netherlands fi[email protected] http://www.cwi.nl/research-groups/Database-Architectures/ 1. INTRODUCTION power provided by the early MonetDB implementa- The Database research group at CWI was estab- tions. In the years following, many technical inno- lished in 1985. It has steadily grown from two PhD vations were paired with strong industrial maturing students to a group of 17 people ultimo 2011. The of the software base. Data Distilleries became a sub- group is supported by a scientific programmer and sidiary of SPSS in 2003, which in turn was acquired a system engineer to keep our machines running. by IBM in 2009. In this short note, we look back at our past and Moving MonetDB Version 4 into the open-source highlight the multitude of topics being addressed. domain required a large number of extensions to the code base. It became of the utmost importance to support a mature implementation of the SQL- 2. THE MONETDB ANTHOLOGY 03 standard, and the bulk of application program- The workhorse and focal point for our research is ming interfaces (PHP, JDBC, Python, Perl, ODBC, MonetDB, an open-source columnar database sys- RoR). The result of this activity was the first official tem. Its development goes back as far as the early open-source release in 2004. A very strong XQuery eighties when our first relational kernel, called Troll, front-end was developed with partners and released was shipped as an open-source product. It spread to in 2005 [1].
    [Show full text]
  • Micro Adaptivity in Vectorwise
    Micro Adaptivity in Vectorwise Bogdan Raducanuˇ Peter Boncz Actian CWI [email protected] [email protected] ∗ Marcin Zukowski˙ Snowflake Computing marcin.zukowski@snowflakecomputing.com ABSTRACT re-arrange the shape of a query plan in reaction to its execu- Performance of query processing functions in a DBMS can tion so far, thereby enabling the system to correct mistakes be affected by many factors, including the hardware plat- in the cost model, or to e.g. adapt to changing selectivities, form, data distributions, predicate parameters, compilation join hit ratios or tuple arrival rates. Micro Adaptivity, intro- method, algorithmic variations and the interactions between duced here, is an orthogonal approach, taking adaptivity to these. Given that there are often different function imple- the micro level by making the low-level functions of a query mentations possible, there is a latent performance diversity executor more performance-robust. which represents both a threat to performance robustness Micro Adaptivity was conceived inside the Vectorwise sys- if ignored (as is usual now) and an opportunity to increase tem, a high performance analytical RDBMS developed by the performance if one would be able to use the best per- Actian Corp. [18]. The raw execution power of Vectorwise forming implementation in each situation. Micro Adaptivity, stems from its vectorized query executor, in which each op- proposed here, is a framework that keeps many alternative erator performs actions on vectors-at-a-time, rather than a function implementations (“flavors”) in a system. It uses a single tuple-at-a-time; where each vector is an array of (e.g.
    [Show full text]
  • Master Data Management (MDM) Improves Information Quality to Deliver Value
    MIT Information Quality Industry Symposium, July 15-17, 2009 Master Data Management (MDM) Improves Information Quality to Deliver Value ABSTRACT Master Data Management (MDM) reasserts IT’s role in responsibly managing business-critical information—master data—to reasonable quality and reliability standards. Master data includes information about products, customers, suppliers, locations, codes, and other critical business data. For most commercial businesses, governmental, and non-governmental organizations, master data is in a deplorable state. This is true although MDM is not new—it was, is, and will be IT’s job to remediate and control. Poor-quality master data is an unfunded enterprise liability. Bad data reduces the effectiveness of product promotions; it causes unnecessary confusion throughout the organization; it misleads corporate officers, regulators and shareholders; and it dramatically increases the IT cost burden with ceaseless and fruitless spending for incomplete error remediation projects. This presentation describes—with examples—why senior IT management should launch an MDM process, what governance they should institute for master data, and how they can build cost-effective MDM solutions for the enterprises they serve. BIOGRAPHY Joe Bugajski Senior Analyst Burton Group Joe Bugajski is a senior analyst for Burton Group. He covers IT governance, architecture, and data – governance, management, standards, access, quality, and integration. Prior to Burton Group, Joe was chief data officer at Visa responsible worldwide for information interoperability. He also served as Visa’s chief enterprise architect and resident entrepreneur covering risk systems and business intelligence. Prior to Visa, Joe was CEO of Triada, a business intelligence software provider. Before Triada, Joe managed engineering information systems for Ford Motor Company.
    [Show full text]
  • Hmi + Clds = New Sdsc Center
    New Technologies for Data Management Chaitan Baru 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO 2 2 Why new technologies? • Big Data Characteristics: Volume, Velocity, Variety • Began as a Volume problem ▫ E.g. Web crawls … ▫ 1’sPB-100’sPB in a single cluster • Velocity became an issue ▫ E.g. Clickstreams at Facebook ▫ 100,000 concurrent clients ▫ 6 billion messages/day • Variety is now important ▫ Integrate, fuse data from multiple sources ▫ Synoptic view; solve complex problems 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO 3 3 Varying opinions on Big Data • “It’s all about velocity. The others issues are old problems.” • “Variety is the important characteristic.” • “Big Data is nothing new.” 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO 4 4 Expanding Enterprise Systems: From OLTP to OLAP to “Catchall” • OLTP: OnLine Transaction Processing ▫ E.g., point-of-sale terminals, e-commerce transactions, etc. ▫ High throughout; high rate of “single shot” read/write operations ▫ Large number of concurrent clients ▫ Robust, durable processing 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO 5 5 Expanding Enterprise Systems: OLAP • OLAP: Decision Support Systems requiring complex query processing ▫ Fast response times for complex SQL queries ▫ Queries over historical data ▫ Fewer number of concurrent clients ▫ Data aggregated from multiple systems Multiple business systems, e.g.
    [Show full text]
  • Columnar Storage in SQL Server 2012
    Columnar Storage in SQL Server 2012 Per-Ake Larson Eric N. Hanson Susan L. Price [email protected] [email protected] [email protected] Abstract SQL Server 2012 introduces a new index type called a column store index and new query operators that efficiently process batches of rows at a time. These two features together greatly improve the performance of typical data warehouse queries, in some cases by two orders of magnitude. This paper outlines the design of column store indexes and batch-mode processing and summarizes the key benefits this technology provides to customers. It also highlights some early customer experiences and feedback and briefly discusses future enhancements for column store indexes. 1 Introduction SQL Server is a general-purpose database system that traditionally stores data in row format. To improve performance on data warehousing queries, SQL Server 2012 adds columnar storage and efficient batch-at-a- time processing to the system. Columnar storage is exposed as a new index type: a column store index. In other words, in SQL Server 2012 an index can be stored either row-wise in a B-tree or column-wise in a column store index. SQL Server column store indexes are “pure” column stores, not a hybrid, because different columns are stored on entirely separate pages. This improves I/O performance and makes more efficient use of memory. Column store indexes are fully integrated into the system. To improve performance of typical data warehous- ing queries, all a user needs to do is build a column store index on the fact tables in the data warehouse.
    [Show full text]
  • User's Guide for Oracle Data Visualization
    Oracle® Fusion Middleware User's Guide for Oracle Data Visualization 12c (12.2.1.4.0) E91572-02 September 2020 Oracle Fusion Middleware User's Guide for Oracle Data Visualization, 12c (12.2.1.4.0) E91572-02 Copyright © 2015, 2020, Oracle and/or its affiliates. Primary Author: Hemala Vivek Contributing Authors: Nick Fry, Christine Jacobs, Rosie Harvey, Stefanie Rhone Contributors: Oracle Business Intelligence development, product management, and quality assurance teams This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations.
    [Show full text]