Building Star Schema with SAS® Software

Total Page:16

File Type:pdf, Size:1020Kb

Building Star Schema with SAS® Software Building Star Schema With SAS® Software An Introduction to a Data Warehouse Data Structure Mark Shephard [email protected] June 20 - 23 2000 SeUGI 18 DUBLIN 1 Building Star Schema... ! Basics ! Review of query mechanisms ! Performance ! A look at metadata ! Exploiting SAS/AF® classes ! Abstractions from the basic model ! More metadata, another class... June 20 - 23 2000 SeUGI 18 DUBLIN 2 Star Schema Basics ! Few, additive facts ! Facts described by Dimensions " unique business key on each row " arbitrary keys - dates an exception " unknown data has a valid key " appropriate key lengths ! Facts selected by constraining dimensions June 20 - 23 2000 SeUGI 18 DUBLIN 3 2 1 2 3 4 5 6 StarKV1V2 V3 VSchema4 V5 V6 Basics K V V V V V V 1 1 2 2 3 KK’K2 K3 K4 K5 $ 3 4 4 8 7 12 4 8 45 99 5 5 11 3 75 71 28 52 0 13 21 67 10 25 47 12 14 15 34 95 11 12 67 52 3 12 85 1086 K’ V1V2 V3 V4 V5 V6 K3 V1V2 V3 V4 V5 V6 1 1 2 2 3 3 Fact Table 4 4 5 5 Dimension Table June 20 - 23 2000 SeUGI 18 DUBLIN 4 Query Mechanisms ! Two query phases: " constrain dimension keys to select particular rows from the fact table; " use fact row foreign keys to recover additional dimensional information ! SAS offers both SQL and Datastep: " does it matter which we choose? June 20 - 23 2000 SeUGI 18 DUBLIN 5 Sample SQL proc sql; create table results as select * from work.fact_tab where keyA = (select keyA from work.dimA where varA = 'keyA= 3') and keyB = (select keyB from work.dimB where varB = 'keyB= 9') and keyC = (select keyC from work.dimC where varC = 'keyC= 22') ; quit; June 20 - 23 2000 SeUGI 18 DUBLIN 6 Datastep Segment data results ; _iorc_ = 0; set dimA (where = (varA = 'keyA= 3')); do while (_iorc_ = 0); set fact_tab key = keyA; if _iorc_ = 0 then do; do while (_iorc_ = 0); set dimB key = keyB /unique; if _iorc_ = 0 and varB = 'keyB= 9' then do; do while (_iorc_ = 0); set dimC key = keyC /unique; if _iorc_ = 0 and varC = 'keyC= 22' then do; June 20 - 23 2000 SeUGI 18 DUBLIN 7 Query Performance ! SQL 5.996.206.256.206.09 secs ! Datastep#1 2.41 2.37 2.41 2.41 2.41 secs ! Datastep#2 1.20 1.26 1.32 1.32 1.26 secs ! 66,000 row fact table; Pentium 100; Win95 ! Additional dimension data can be retrieved concurrently ! Datasteps are faster and tunable... June 20 - 23 2000 SeUGI 18 DUBLIN 8 Improving Performance ! Fix the order of dimension processing " choose the dimension that will return the least rows from the fact table first " requires the processing of dimensions before the fact table ! Assumes dimension value distribution is the same as foreign key distribution in the fact table... June 20 - 23 2000 SeUGI 18 DUBLIN 9 Realizing data with classes ! LMC: Logical Metadata Class " providing the user’s view on the Schema ! QMC: Query Metadata Class " encapsulating the user’s query ! QEC: Query Engine Class " generating an instance of the user’s query " performing query optimization ! SAS/Warehouse Administrator™ June 20 - 23 2000 SeUGI 18 DUBLIN 10 Realizing data with classes Query INTERFACE QMC Metadata LMC Data Facts; set Facts key = kvar QEC /unique; Logical Metadata June 20 - 23 2000 SeUGI 18 DUBLIN 11 Abstractions ! ‘effective periods’ and slowly moving dimensions... ! ‘AND’ operators between values ! ‘Navigational’ Dimensions ! Joining schemas ! ‘fact-less’ schemas ! Hierarchy support, multiple passes,... June 20 - 23 2000 SeUGI 18 DUBLIN 12 Effective Periods ! Folks get married, situations change, etc. ! Manage in the dimension tables… " retain keys; " add ‘effective dates’; ! Query complexity rises - " all but rules out SQL ! Ensure the ‘truthfulness’ of a query... June 20 - 23 2000 SeUGI 18 DUBLIN 13 Effective Period WHEF WHET Fact Row WHEF WHET Dimension#1 Row WHEF WHET Dimension#2 Row WHEF WHET Dimension#3 Row Effective Period Time June 20 - 23 2000 SeUGI 18 DUBLIN 14 ‘AND’ operations ! Required when a single fact foreign key needs to describe a combination of values " e.g. multiple covers on an insurance policy ! Only useful for the selection of data " can’t resolve which of the combination is responsible for what proportion of the fact " the number of actually occurring combinations is the critical factor June 20 - 23 2000 SeUGI 18 DUBLIN 15 ‘AND’ operations KV1V2 V3 V4 V5 V6 K’ 12345 # KK’K2 K3 K4 K5 $ 1 1’ 2 8 7 12 4 8 45 99 2 2’ 2 11 3 75 71 28 52 0 3 3’ 2 13 21 67 10 25 47 12 4 4’ 4 14 15 34 95 11 12 67 5 5’ 1 52 3 12 85 1086 Dimension Table: ‘Link’ Table: Fact Table: Two rows have been Dimension Rows translate The rows with the selected by the user to columns to return a Link Table keys are single key where the selected combination is valid June 20 - 23 2000 SeUGI 18 DUBLIN 16 Further abstractions ! Partitioning physical dimensions " improving update performance # targeted indexing " improved file space usage " sympathetic to the user view - putting data where the user expects to find it ! PMC: Physical Metadata Class " Organization and management of the ‘real’ datasets June 20 - 23 2000 SeUGI 18 DUBLIN 17 Further abstractions a b a a i c d c b i b j d e e j k g f i h k f c g j c h e i d k d f k h j h h g Physical “Physical” Logical Datasets Dimensions Dimensions June 20 - 23 2000 SeUGI 18 DUBLIN 18 Closing thoughts ! The SAS® System offers a number of facilities to build and extend Star Schema structures. ! Metadata is the key to providing an interface user’s will use in combination with the functionality they’d want. ! Organize metadata carefully - use SAS/Warehouse Administrator™ June 20 - 23 2000 SeUGI 18 DUBLIN 19 Acknowledgements SAS is a registered trademark of SAS Institute Inc., Cary, NC, USA SAS/AF is a registered trademark of SAS Institute Inc., Cary, NC, USA SAS/Warehouse Administrator is a trademark of SAS Institute Inc., Cary, NC, USA All other brand and product names are trademarks or registered trademarks of the respective companies. Mark Shephard [email protected] June 20 - 23 2000 SeUGI 18 DUBLIN 20 Building Star Schema With SAS! Software An Introduction to a Data Warehouse Data Structure. Mark Shephard Sound Marketing Hindhead, UK This paper discusses the creation of star schema data structures as a store for detailed data within a Data Warehouse. Using the SAS! System throughout, as data loading mechanism, storage medium and exploitation tool, an efficient and capable Data Warehouse can be created to enable exploratory analysis of large volumes of detailed data. A number of abstractions from the familiar structure are made, exploiting the facilities of the SAS! System to better meet our requirements. The Star Schema is a very popular Clustered around the fact table are mechanism for the storage of data with a dimension tables, appearing as the ‘rays’ data warehouse. There are any number of books and conference papers expounding Star Schema (generic) 2 1 2 3 4 5 6 their virtues as means for enabling multi- KV1 V2 V3 V4 V5 V6 K V V V V V V 1 1 dimensional analysis of often fairly detailed 2 2 3 KK’K2 K3 K4 K5 $ 3 4 data. What you may have noticed, if you 4 8 7 12 4 8 45 99 5 5 11 3 75 71 28 52 0 have read any of this literature, is the 13 21 67 10 25 47 12 14 15 34 95 11 12 67 absence of a discussion of a star schema 52 3 12 85 1086 ! K’ V1 V2 V3 V4 V5 V6 K3 V1 V2 V3 V4 V5 V6 built using the SAS System as the primary 1 1 2 2 data store. This paper redresses that 3 3 Fact Table 4 4 balance. 5 5 Dimension Table Commonly the data warehouse data store is built using a mainstream OLTP database. Similarly the incumbent star schema is Figure 1: The basic star schema. queried using SQL. This imposes a number of restrictions on the function and capability of the warehouse, largely because the emanating from the star. Typically the fact schema design has to closely adhere to the table is a highly normalised structure. Each limitations of the database and particularly of its columns contain either a dimension those of SQL. Here we describe a table key or the information or ‘fact’ that we warehouse data structure built entirely from require. The purpose of the dimension SAS! System Software, enabling the tables is to describe the ‘fact’ in the fact construction of a data store that is both table. Each dimension table is therefore de- functionally rich and generically capable. normalised allowing the values within it to be browsed, thereby enabling the simplest possible mechanism for identifying a fact. Star Schema Basics A fact is completely described by the A brief recap of the basics of the star foreign keys associated with it in its row of schema structure is perhaps appropriate, if the fact table. Joins should be made between only to standardise on a number of terms of the dimension tables and the fact table only nomenclature. This we’ll do with the aid of – not between one dimension table and Figure 1. The primary component of the another. The keys used to relate the tables star schema is the fact table.
Recommended publications
  • Cubes Documentation Release 1.0.1
    Cubes Documentation Release 1.0.1 Stefan Urbanek April 07, 2015 Contents 1 Getting Started 3 1.1 Introduction.............................................3 1.2 Installation..............................................5 1.3 Tutorial................................................6 1.4 Credits................................................9 2 Data Modeling 11 2.1 Logical Model and Metadata..................................... 11 2.2 Schemas and Models......................................... 25 2.3 Localization............................................. 38 3 Aggregation, Slicing and Dicing 41 3.1 Slicing and Dicing.......................................... 41 3.2 Data Formatters........................................... 45 4 Analytical Workspace 47 4.1 Analytical Workspace........................................ 47 4.2 Authorization and Authentication.................................. 49 4.3 Configuration............................................. 50 5 Slicer Server and Tool 57 5.1 OLAP Server............................................. 57 5.2 Server Deployment.......................................... 70 5.3 slicer - Command Line Tool..................................... 71 6 Backends 77 6.1 SQL Backend............................................. 77 6.2 MongoDB Backend......................................... 89 6.3 Google Analytics Backend...................................... 90 6.4 Mixpanel Backend.......................................... 92 6.5 Slicer Server............................................. 94 7 Recipes 97 7.1 Recipes...............................................
    [Show full text]
  • Star Schema Modeling with Pentaho Data Integration
    Star Schema Modeling With Pentaho Data Integration Saurischian and erratic Salomo underworked her accomplishment deplumes while Phil roping some diamonds believingly. Torrence elasticize his umbrageousness parsed anachronously or cheaply after Rand pensions and darn postally, canalicular and papillate. Tymon trodden shrinkingly as electropositive Horatius cumulates her salpingectomies moat anaerobiotically. The email providers have a look at pentaho user console files from a collection, an individual industries such processes within an embedded saiku report manager. The database connections in data modeling with schema. Entity Relationship Diagram ERD star schema Data original database creation. For more details, the proposed DW system ran on a Windowsbased server; therefore, it responds very slowly to new analytical requirements. In this section we'll introduce modeling via cubes and children at place these models are derived. The data presentation level is the interface between the system and the end user. Star Schema Modeling with Pentaho Data Integration Tutorial Details In order first write to XML file we pass be using the XML Output quality This is. The class must implement themondrian. Modeling approach using the dimension tables and fact tables 1 Introduction The use. Data Warehouse Dimensional Model Star Schema OLAP Cube 5. So that will not create a lot when it into. But it will create transformations on inventory transaction concepts, integrated into several study, you will likely send me? Thoughts on open Vault vs Star Schemas the bi backend. Table elements is data integration tool which are created all the design to the farm before with delivering aggregated data quality and data is preventing you.
    [Show full text]
  • Business Intelligence and Column-Oriented Databases
    Central____________________________________________________________________________________________________ European Conference on Information and Intelligent Systems Page 12 of 344 Business Intelligence and Column-Oriented Databases Kornelije Rabuzin Nikola Modrušan Faculty of Organization and Informatics NTH Mobile, University of Zagreb Međimurska 28, 42000 Varaždin, Croatia Pavlinska 2, 42000 Varaždin, Croatia [email protected] [email protected] Abstract. In recent years, NoSQL databases are popular document-oriented database systems is becoming more and more popular. We distinguish MongoDB. several different types of such databases and column- oriented databases are very important in this context, for sure. The purpose of this paper is to see how column-oriented databases can be used for data warehousing purposes and what the benefits of such an approach are. HBase as a data management Figure 1. JSON object [15] system is used to store the data warehouse in a column-oriented format. Furthermore, we discuss Graph databases, on the other hand, rely on some how star schema can be modelled in HBase. segment of the graph theory. They are good to Moreover, we test the performances that such a represent nodes (entities) and relationships among solution can provide and we compare them to them. This is especially suitable to analyze social relational database management system Microsoft networks and some other scenarios. SQL Server. Key value databases are important as well for a certain key you store (assign) a certain value. Keywords. Business Intelligence, Data Warehouse, Document-oriented databases can be treated as key Column-Oriented Database, Big Data, NoSQL value as long as you know the document id. Here we skip the details as it would take too much time to discuss different systems [21].
    [Show full text]
  • Star Vs Snowflake Schema in Data Warehouse
    Star Vs Snowflake Schema In Data Warehouse Fiddly and genealogic Thomas subdividing his inliers parochialising disable strong. Marlowe often reregister fumblingly when trachytic Hiralal castrate weightily and strafe her lavender. Hashim is three-cornered and oversubscribe cursedly as tenebrious Emory defuzes taxonomically and plink denominationally. Alike dive into data warehouse star schema in snowflake data Hope you have understood this theory based article in our next upcoming article we understand in a practical way using an example of how to create star schema design model and snowflake design model. Radiating outward from the fact table, we will have two dimension tables for products and customers. Workflow orchestration service built on Apache Airflow. However, unlike a star schema, a dimension table in a snowflake schema is divided out into more than one table, and placed in relation to the center of the snowflake by cardinality. Now comes a major question that a developer has to face before starting to design a data warehouse. Difference Between Star and Snowflake Schema. Star schema is the base to design a star cluster schema and few essential dimension tables from the star schema are snowflaked and this, in turn, forms a more stable schema structure. Edit or create new comparisons in your area of expertise. Add intelligence and efficiency to your business with AI and machine learning. Efficiently with windows workloads, schema star vs snowflake in data warehouse builder uses normalization is the simplest type, hence we must first error posting these facts and is normalized. The most obvious aggregate function to use is COUNT, but depending on the type of data you have in your dimensions, other functions may prove useful.
    [Show full text]
  • Beyond the Data Model: Designing the Data Warehouse
    Beyond the Data Model: of a Designing the three-part series Data Warehouse By Josh Jones and Eric Johnson CA ERwin TABLE OF CONTENTS INTRODUCTION . 3 DATA WAREHOUSE DESIGN . 3 MODELING A DATA WAREHOUSE . 3 Data Warehouse Elements . 4 Star Schema . 4 Snowflake Schema . 4 Building the Model . 4 EXTRACT, TRANSFORM, AND LOAD . 7 Extract . 7 Transform . 7 Load . 7 Metadata . 8 SUMMARY . 8 2 ithout a doubt one of the most important because you can add new topics without affecting the exist- aspects data storage and manipulation ing data. However, this method can be cumbersome for non- is the use of data for critical decision technical users to perform ad-hoc queries against, as they making. While companies have been must have an understanding of how the data is related. searching their stored data for decades, it’s only really in the Additionally, reporting style queries may not perform well last few years that advanced data mining and data ware- because of the number of tables involved in each query. housing techniques have become a focus for large business- In a nutshell, the dimensional model describes a data es. Data warehousing is particularly valuable for large enter- warehouse that has been built from the bottom up, gather- prises that have amassed a significant amount of historical ing transactional data into collections of “facts” and “dimen- data such as sales figures, orders, production output, etc. sions”. The facts are generally, the numeric data (think dol- Now more than ever, it is critical to be able to build scalable, lars, inventory counts, etc.), and the dimensions are the bits accurate data warehouse solutions that can help a business of information that put the numbers, or facts, into context move forward successfully.
    [Show full text]
  • CDW: a Conceptual Overview 2017
    CDW: A Conceptual Overview 2017 by Margaret Gonsoulin, PhD March 29, 2017 Thanks to: • Richard Pham, BISL/CDW for his mentorship • Heidi Scheuter and Hira Khan for organizing this session 3 Poll #1: Your CDW experience • How would you describe your level of experience with CDW data? ▫ 1- Not worked with it at all ▫ 2 ▫ 3 ▫ 4 ▫ 5- Very experienced with CDW data Agenda for Today • Get to the bottom of all of those acronyms! • Learn to think in “relational data” terms • Become familiar with the components of CDW ▫ Production and Raw Domains ▫ Fact and Dimension tables/views • Understand how to create an analytic dataset ▫ Primary and Foreign Keys ▫ Joining tables/views Agenda for Today • Get to the bottom of all of those acronyms! • Learn to think in “relational data” terms • Become familiar with the components of CDW ▫ Production and Raw Domains ▫ Fact and Dimension tables/views • Creating an analytic dataset ▫ Primary and Foreign Keys ▫ Joining tables/views “C”DW, “R”DW & “V”DW • Users will see documentation referring to xDW. • The “x” is a variable waiting to be filled in with either: ▫ “V” for VISN, ▫ “R” for region or ▫ “C” for corporate (meaning “national VHA”) • Each organizational level of the VA has its own data warehouse focusing on its own population. • This talk focuses on CDW only. 7 Flow of data into the warehouse VistA = Veterans Health Information Systems and Technology Architecture C“DW” • The “DW” in CDW stands for “Data Warehouse.” • Data Warehouse = a data delivery system intended to give users the information they need to support their business decisions.
    [Show full text]
  • Building an Effective Data Warehousing for Financial Sector
    Automatic Control and Information Sciences, 2017, Vol. 3, No. 1, 16-25 Available online at http://pubs.sciepub.com/acis/3/1/4 ©Science and Education Publishing DOI:10.12691/acis-3-1-4 Building an Effective Data Warehousing for Financial Sector José Ferreira1, Fernando Almeida2, José Monteiro1,* 1Higher Polytechnic Institute of Gaya, V.N.Gaia, Portugal 2Faculty of Engineering of Oporto University, INESC TEC, Porto, Portugal *Corresponding author: [email protected] Abstract This article presents the implementation process of a Data Warehouse and a multidimensional analysis of business data for a holding company in the financial sector. The goal is to create a business intelligence system that, in a simple, quick but also versatile way, allows the access to updated, aggregated, real and/or projected information, regarding bank account balances. The established system extracts and processes the operational database information which supports cash management information by using Integration Services and Analysis Services tools from Microsoft SQL Server. The end-user interface is a pivot table, properly arranged to explore the information available by the produced cube. The results have shown that the adoption of online analytical processing cubes offers better performance and provides a more automated and robust process to analyze current and provisional aggregated financial data balances compared to the current process based on static reports built from transactional databases. Keywords: data warehouse, OLAP cube, data analysis, information system, business intelligence, pivot tables Cite This Article: José Ferreira, Fernando Almeida, and José Monteiro, “Building an Effective Data Warehousing for Financial Sector.” Automatic Control and Information Sciences, vol.
    [Show full text]
  • Integrating Compression and Execution in Column-Oriented Database Systems
    Integrating Compression and Execution in Column-Oriented Database Systems Daniel J. Abadi Samuel R. Madden Miguel C. Ferreira MIT MIT MIT [email protected] [email protected] [email protected] ABSTRACT commercial arena [21, 1, 19], we believe the time is right to Column-oriented database system architectures invite a re- systematically revisit the topic of compression in the context evaluation of how and when data in databases is compressed. of these systems, particularly given that one of the oft-cited Storing data in a column-oriented fashion greatly increases advantages of column-stores is their compressibility. the similarity of adjacent records on disk and thus opportuni- Storing data in columns presents a number of opportuni- ties for compression. The ability to compress many adjacent ties for improved performance from compression algorithms tuples at once lowers the per-tuple cost of compression, both when compared to row-oriented architectures. In a column- in terms of CPU and space overheads. oriented database, compression schemes that encode multi- In this paper, we discuss how we extended C-Store (a ple values at once are natural. In a row-oriented database, column-oriented DBMS) with a compression sub-system. We such schemes do not work as well because an attribute is show how compression schemes not traditionally used in row- stored as a part of an entire tuple, so combining the same oriented DBMSs can be applied to column-oriented systems. attribute from different tuples together into one value would We then evaluate a set of compression schemes and show that require some way to \mix" tuples.
    [Show full text]
  • Column-Stores Vs. Row-Stores: How Different Are They Really?
    Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi Samuel R. Madden Nabil Hachem Yale University MIT AvantGarde Consulting, LLC New Haven, CT, USA Cambridge, MA, USA Shrewsbury, MA, USA [email protected] [email protected] [email protected] ABSTRACT General Terms There has been a significant amount of excitement and recent work Experimentation, Performance, Measurement on column-oriented database systems (“column-stores”). These database systems have been shown to perform more than an or- Keywords der of magnitude better than traditional row-oriented database sys- tems (“row-stores”) on analytical workloads such as those found in C-Store, column-store, column-oriented DBMS, invisible join, com- data warehouses, decision support, and business intelligence appli- pression, tuple reconstruction, tuple materialization. cations. The elevator pitch behind this performance difference is straightforward: column-stores are more I/O efficient for read-only 1. INTRODUCTION queries since they only have to read from disk (or from memory) Recent years have seen the introduction of a number of column- those attributes accessed by a query. oriented database systems, including MonetDB [9, 10] and C-Store [22]. This simplistic view leads to the assumption that one can ob- The authors of these systems claim that their approach offers order- tain the performance benefits of a column-store using a row-store: of-magnitude gains on certain workloads, particularly on read-intensive either by vertically partitioning the schema, or by indexing every analytical processing workloads, such as those encountered in data column so that columns can be accessed independently. In this pa- warehouses.
    [Show full text]
  • Using Online Analytical Processing (OLAP) in Data Warehousing
    International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Using Online Analytical Processing (OLAP) in Data Warehousing Mohieldin Mahmoud1, Alameen Eltoum2, Ramadan Faraj3 1, 2, 3al Asmarya Islamic University Abstract: Data warehousing is needed in recent time to give information that helps in the development of a company, support decision makers, it provides an efficient ways for transforming, manipulating analyzing large amount of data from different organizations or companies. In this paper we aimed to over view, data warehousing definition; how data are gathered from multi sources in to data warehousing using a model. Important techniques were shown; how to analyze data using data warehousing, to make decisions, this technique called: Online Analytical Processing OLAP. Keywords: Data Warehousing, OLAP, gathering, data cleansing, OLAP FASMI Test. 1. Introduction Enhances Data Quality and Consistency, Provides Historical data And Generates a High ROI. The concept of data warehousing returns back to the early 1980s, where was considered relational database 3. Gathering Data From Multi Source management systems. A data warehouse is a destination (archive) of information gathered from many sources, kept The data sources send new information in to warehouse, under a unified schema, at a single site. either continuously or periodically where data warehouse periodically requests new information from data sources then The businesses of all sizes its realized the importance of data keeping warehouse exactly synchronized from data sources warehouse, because there are significant benefits by but that is too expensive, and acts on data updates are implementing a Data warehouse.
    [Show full text]
  • Columnar Storage in SQL Server 2012
    Columnar Storage in SQL Server 2012 Per-Ake Larson Eric N. Hanson Susan L. Price [email protected] [email protected] [email protected] Abstract SQL Server 2012 introduces a new index type called a column store index and new query operators that efficiently process batches of rows at a time. These two features together greatly improve the performance of typical data warehouse queries, in some cases by two orders of magnitude. This paper outlines the design of column store indexes and batch-mode processing and summarizes the key benefits this technology provides to customers. It also highlights some early customer experiences and feedback and briefly discusses future enhancements for column store indexes. 1 Introduction SQL Server is a general-purpose database system that traditionally stores data in row format. To improve performance on data warehousing queries, SQL Server 2012 adds columnar storage and efficient batch-at-a- time processing to the system. Columnar storage is exposed as a new index type: a column store index. In other words, in SQL Server 2012 an index can be stored either row-wise in a B-tree or column-wise in a column store index. SQL Server column store indexes are “pure” column stores, not a hybrid, because different columns are stored on entirely separate pages. This improves I/O performance and makes more efficient use of memory. Column store indexes are fully integrated into the system. To improve performance of typical data warehous- ing queries, all a user needs to do is build a column store index on the fact tables in the data warehouse.
    [Show full text]
  • Virtual Denormalization Via Array Index Reference for Main Memory OLAP
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. X, XXXXX 2015 1 Virtual Denormalization via Array Index Reference for Main Memory OLAP Yansong Zhang, Xuan Zhou, Ying Zhang, Yu Zhang, Mingchuan Su, and Shan Wang Abstract—Denormalization is a common tactic for enhancing performance of data warehouses, though its side-effect is quite obvious. Besides being confronted with update abnormality, denormalization has to consume additional storage space. As a result, this tactic is rarely used in main memory databases, which regards storage space, i.e., RAM, as scarce resource. Nevertheless, our research reveals that main memory database can benefit enormously from denormalization, as it is able to remarkably simplify the query processing plans and reduce the computation cost. In this paper, we present A-Store, a main memory OLAP engine customized for star/snowflake schemas. Instead of generating fully materialized denormalization, A-Store resorts to virtual denormalization by treating array indexes as primary keys. This design allows us to harvest the benefit of denormalization without sacrificing additional RAM space. A-Store uses a generic query processing model for all SPJGA queries. It applies a number of state-of-the-art optimization methods, such as vectorized scan and aggregation, to achieve superior performance. Our experiments show that A-Store outperforms the most prestigious MMDB systems significantly in star/snowflake schema based query processing. Index Terms—Main-memory, OLAP, denormalization, A-Store, array index. Ç 1INTRODUCTION HE purpose of database normalization is to eliminate strategy of denormalization. Recent development of MMDB Tdata redundancy, so as to save storage space and avoid [1], [2], [3] has shown that simplicity of data processing update abnormality.
    [Show full text]