Leveraging DB2 10 for High Performance of Your Data Warehouse

Total Page:16

File Type:pdf, Size:1020Kb

Leveraging DB2 10 for High Performance of Your Data Warehouse Front cover IBM® DB2® IBM® Information Management Software Leveraging DB2 10 for High Performance of Your Data Warehouse Comprehensive platform for architected data warehouse solutions Faster time to value for business decision making Standards-based for flexibility and change Whei-Jen Chen Scott Andrus Bhuvana Balaji Enzo Cialini Michael Kwok Roman B. Melnyk Jessica Rockwood ibm.com/redbooks International Technical Support Organization Leveraging DB2 10 for High Performance of Your Data Warehouse January 2014 SG24-8157-00 Note: Before using this information and the product it supports, read the information in “Notices” on page ix. First Edition (January 2014) This edition applies to DB2 for Linux, UNIX, and Windows Version 10.5. © Copyright International Business Machines Corporation 2014. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . ix Trademarks . x Preface . xi Authors . xi Acknowledgements . xiv Now you can become a published author, too! . xv Comments welcome. xv Stay connected to IBM Redbooks . xv Part 1. Overview . 1 Chapter 1. Gaining business insight with IBM DB2 . 3 1.1 Current business challenges . 4 1.2 Big data and the data warehouse . 5 1.2.1 Data warehouse infrastructure . 6 1.3 High performance warehouse with DB2 for Linux, UNIX, and Windows. 7 Chapter 2. Technical overview of IBM DB2 Warehouse. 9 2.1 DB2 Warehouse solutions. 10 2.1.1 The component groups . 10 2.1.2 Editions . 11 2.2 Expert systems . 17 2.2.1 PureData for operational analytics . 18 2.2.2 DB2 Warehouse components . 19 2.2.3 Database management system . 22 2.3 DB2 Warehouse topology . 25 Chapter 3. Warehouse development lifecycle . 29 3.1 Defining business requirements . 30 3.2 Building the data model. 31 3.2.1 Defining the physical data model . 31 3.2.2 Creating the physical data model . 32 3.2.3 Working with diagrams . 36 3.2.4 Using the Diagram Editor . 38 3.2.5 Optimal attributes of the physical data model . 39 3.2.6 Model analysis . 40 3.2.7 Deploying the data model . 42 3.2.8 Maintaining model accuracy . 45 © Copyright IBM Corp. 2014. All rights reserved. iii 3.3 Matching the technology to the requirements . 48 Part 2. Technologies . 51 Chapter 4. Column-organized data store with BLU Acceleration. 53 4.1 Simple and fast with BLU Acceleration . 54 4.1.1 Dynamic in-memory columnar technology . 54 4.1.2 Actionable compression . 55 4.1.3 Parallel vector processing . 55 4.1.4 Data skipping . 56 4.2 Getting started with BLU Acceleration. 56 4.2.1 Capacity planning . 57 4.2.2 Storage requirements . 58 4.3 Creating a column-organized data store . 58 4.3.1 Single configuration setting for analytic workloads . 59 4.3.2 Fine-tuning a configuration . 60 4.3.3 Creating column-organized tables . 61 4.3.4 Converting to column-organized tables. 63 4.4 Loading a column-organized data store . 63 4.4.1 Synopsis tables and data skipping . 64 4.4.2 Compression of column-organized tables. 65 4.4.3 Reducing load times . 67 4.5 Managing a column-organized data mart . 68 4.5.1 Query execution plans and the new CTQ operator. 68 4.5.2 Database maintenance . 70 Chapter 5. Row-based data store. 71 5.1 Scale-out solution: Database Partitioning Feature . 72 5.1.1 Partition . 72 5.1.2 Partition group. 73 5.1.3 Table spaces in a DPF environment . 74 5.1.4 Partitioning keys . 74 5.1.5 Partition maps . 75 5.1.6 Choosing partitioning keys . 76 5.1.7 Concluding remarks . 78 5.2 Scale-up solution: Intrapartition parallelism. 79 5.3 Other features for row-based data warehouse . 81 5.3.1 Compression . 81 5.3.2 Multidimensional data clustering . 83 5.3.3 Range partitioning . 87 Chapter 6. Data movement and transformation . 91 6.1 Introduction . 92 6.1.1 Load . 92 iv Leveraging DB2 10 for High Performance of Your Data Warehouse 6.1.2 Ingest . 94 6.1.3 Continuous data ingest . 96 6.2 DB2 Warehouse SQL Warehousing Tool . 98 6.2.1 Architecture . 99 6.2.2 Development environment . 100 6.2.3 Moving from development to production . 101 6.2.4 Runtime environment . 102 Chapter 7. Monitoring . 105 7.1 Understanding monitor elements . 106 7.1.1 Monitor element collection levels . 108 7.1.2 New monitoring elements for column-organized tables . 110 7.2 Using DB2 table functions to monitor your database in real time . 111 7.2.1 Monitoring requests. 111 7.2.2 Monitoring activities. 112 7.2.3 Monitoring data objects . 112 7.2.4 Monitoring locks . 113 7.2.5 Monitoring system memory . 113 7.2.6 Monitoring routines . ..
Recommended publications
  • Xquery in IBM DB2 10 for Z/OS
    ® XQuery in IBM DB2 10 for z/OS Introduction In November 2011, support was added for XQuery in IBM® DB2® 10 for z/OS® through APARs PM47617 and PM47618. XQuery extends XPath with new language constructs, adding expressiveness to the IBM pureXML® features of DB2. Many XQuery queries can be rewritten using a combination of XPath and SQL/XML, but often XQuery is the preferred tool, as it is an easier to code and more readable alternative. Like XPath, XQuery is a W3C standard, and its addition is a further step in ensuring compatibility within the DB2 family of products. XQuery has been available in DB2 for Linux, UNIX, and Windows since 2006. Constructs of XQuery XQuery introduces new types of expressions that can be used in XMLQUERY, XMLTABLE, and XMLEXISTS function calls. The XMLQUERY function is used in the SELECT clause of an SQL query to specify which XML data to retrieve. The XMLTABLE function is used in the FROM clause to extract XML data in a table format. The XMLEXISTS function is used in the WHERE clause to specify under which conditions to retrieve the data. XQuery expressions are most applicable with XMLQUERY, because arguments to XMLEXISTS can be specified by XPath expressions, which are indexable, while XQuery expressions are not. XMLQUERY is not indexable. XQuery can be used in XMLTABLE as the row expression or column expressions, but is usually not needed for the relational result. XQuery could be used to construct new XML values for the XML column results. There are three types of expressions that can be used alone or in combination: FLWOR expressions A FLWOR expression is a loop construct for manipulating XML documents in an application-like manner.
    [Show full text]
  • Best Practices Managing XML Data
    ® IBM® DB2® for Linux®, UNIX®, and Windows® Best Practices Managing XML Data Matthias Nicola IBM Silicon Valley Lab Susanne Englert IBM Silicon Valley Lab Last updated: January 2011 Managing XML Data Page 2 Executive summary ............................................................................................. 4 Why XML .............................................................................................................. 5 Pros and cons of XML and relational data ................................................. 5 XML solutions to relational data model problems.................................... 6 Benefits of DB2 pureXML over alternative storage options .................... 8 Best practices for DB2 pureXML: Overview .................................................. 10 Sample scenario: derivative trades in FpML format............................... 11 Sample data and tables................................................................................ 11 Choosing the right storage options for XML data......................................... 16 Selecting table space type and page size for XML data.......................... 16 Different table spaces and page size for XML and relational data ....... 16 Inlining and compression of XML data .................................................... 17 Guidelines for adding XML data to a DB2 database .................................... 20 Inserting XML documents with high performance ................................ 20 Splitting large XML documents into smaller pieces ..............................
    [Show full text]
  • Mongodb - Document-Oriented Store
    CONTEXT-AWARE SERVICE REGISTRY: MODELING AND IMPLEMENTATION ALAA ALSAIG A THESIS IN THE DEPARTMENT OF COMPUTER SCIENCEAND SOFTWARE ENGINEERING PRESENTEDIN PARTIAL FULFILLMENTOFTHE REQUIREMENTS FORTHE DEGREEOF MASTEROF COMPUTER SCIENCE CONCORDIA UNIVERSITY MONTRÉAL,QUÉBEC,CANADA NOVEMBER 2013 c ALAA ALSAIG, 2013 CONCORDIA UNIVERSITY School of Graduate Studies This is to certify that the thesis prepared By: Miss. Alaa AbdulBasit Alsaig Entitled: Context-Aware Service Registry: Modeling and Implementation and submitted in partial fulfillment of the requirements for the degree of Master in Applied Science (Software Engineering) complies with the regulations of the University and meets the accepted standards with respect to originality and quality. Signed by the final examining committee: ______________________________________ Chair Dr. N. Tsantalis ______________________________________ Examiner Dr. N. Shiri ______________________________________ Examiner Dr. D. Goswami ______________________________________ Co-supervisor Dr. V. S. Alagar ______________________________________ Co-supervisor Dr. M. Mohammad Approved by ________________________________________________ Chair of Department or Graduate Program Director ________________________________________________ Dr. Christopher W. Trueman, Interim Dean Faculty of Engineering and Computer Science Date ________________________________________________ Abstract Context-aware Service Registry: Modeling and Implementation Alaa Alsaig Modern societies have become very dependent on information and
    [Show full text]
  • DB2 10.5 with BLU Acceleration / Zikopoulos / 349-2
    Flash 6X9 / DB2 10.5 with BLU Acceleration / Zikopoulos / 349-2 DB2 10.5 with BLU Acceleration 00-FM.indd 1 9/17/13 2:26 PM Flash 6X9 / DB2 10.5 with BLU Acceleration / Zikopoulos / 349-2 00-FM.indd 2 9/17/13 2:26 PM Flash 6X9 / DB2 10.5 with BLU Acceleration / Zikopoulos / 349-2 DB2 10.5 with BLU Acceleration Paul Zikopoulos Sam Lightstone Matt Huras Aamer Sachedina George Baklarz New York Chicago San Francisco Athens London Madrid Mexico City Milan New Delhi Singapore Sydney Toronto 00-FM.indd 3 9/17/13 2:26 PM Flash 6X9 / DB2 10.5 with BLU Acceleration / Zikopoulos / 349-2 McGraw-Hill Education books are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs. To contact a representative, please visit the Contact Us pages at www.mhprofessional.com. DB2 10.5 with BLU Acceleration: New Dynamic In-Memory Analytics for the Era of Big Data Copyright © 2014 by McGraw-Hill Education. All rights reserved. Printed in the Unit- ed States of America. Except as permitted under the Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of pub- lisher, with the exception that the program listings may be entered, stored, and exe- cuted in a computer system, but they may not be reproduced for publication. All trademarks or copyrights mentioned herein are the possession of their respective owners and McGraw-Hill Education makes no claim of ownership by the mention of products that contain these marks.
    [Show full text]
  • History Repeats Itself: Sensible and Nonsensql Aspects of the Nosql Hoopla C
    History Repeats Itself: Sensible and NonsenSQL Aspects of the NoSQL Hoopla C. Mohan IBM Almaden Research Center 650 Harry Road San Jose, CA 95120, USA +1 408 927 1733 [email protected] ABSTRACT In this paper, I describe some of the recent developments in the 1. INTRODUCTION Over the last few years, many new types of database management database management area, in particular the NoSQL phenomenon systems (DBMSs) have emerged which are being labeled as and the hoopla associated with it. The goal of the paper is not to NoSQL systems. These systems have different design points do an exhaustive survey of NoSQL systems. The aim is to do a compared to the relational DBMSs (RDBMSs), like DB2 and broad brush analysis of what these developments mean - the good Oracle, which have now existed for about 3 decades with SQL as and the bad aspects! Based on my more than three decades of their query language. The NoSQL wave was initially triggered not database systems work in the research and product arenas, I will by the traditional software product companies but by the non- outline what are many of the pitfalls to avoid since there is traditional, namely the Web 2.0, companies like Amazon, currently a mad rush to develop and adopt a plethora of NoSQL Facebook, Google and Yahoo. Some of the NoSQL systems systems in a segment of the IT population, including the research which have emerged from such companies are BigTable, community. In rushing to develop these systems to overcome Cassandra [14], Dynamo and HBase.
    [Show full text]
  • Software Withdrawal: IBM DB2 Linux , UNIX and Windows and IBM Informix Dynamic Server Product Packaging Update, Release and Feature
    IBM Canada Ltd. Withdrawal Announcement A09-0256, dated February 10, 2009 Software withdrawal: IBM DB2 Linux , UNIX and Windows and IBM Informix Dynamic Server product packaging update, release and feature Table of contents 1 Overview 7 Technical support 4 Withdrawn products 8 Order processing 8 Corrections Overview Effective on the dates listed below, IBM® will withdraw from marketing, part numbers from the following product releases licensed under the IBM International Program License Agreement: Program VRM Withdrawal Program release name number from marketing date 5724-N82 9.1.0 05/13/09 DB2® High Availability Feature for WSE V9.1.0 5724-N81 9.1.0 05/13/09 DB2 Query Optimization Feature for WSE V9.1.0 5724-N81 9.5.0 05/13/09 DB2 Query Optimization Feature for WSE V9.5.0 5724-Q35 9.1.0 05/13/09 DB2 pureXMLTM Feature for ESE V9.1.0 5724-Q35 9.5.0 05/13/09 DB2 pureXML Feature for ESE V9.5.0 5724-Q33 9.1.0 05/13/09 DB2 pureXML Feature for EXE V9.1.0 5724-Q33 9.5.0 05/13/09 DB2 pureXML Feature for EXE V9.5.0 5724-Q34 9.1.0 05/13/09 DB2 pureXML Feature for WSE V9.1.0 5724-Q34 9.5.0 05/13/09 DB2 pureXML Feature for WSE V9.5.0 5724-N79 9.1.0 05/13/09 IBM HFF for DB2 ESE V9.1.0 5724-N79 9.5.0 05/13/09 IBM HFF for DB2 ESE V9.5.0 5724-R19 9.1.0 05/13/09 IBM HFF for DB2 EXE V9.1.0 5724-R19 9.5.0 05/13/09 IBM HFF for DB2 EXE V9.5.0 5724-R18 9.1.0 05/13/09 IBM HFF for DB2 WSE V9.1.0 5724-R18 9.5.0 05/13/09 IBM HFF for DB2 WSE V9.5.0 5724-L23 10.00 05/13/09 IBM Informix® Dynamic Server HDR Workgroup Edition 5724-L23 11.10 09/30/09 IBM Informix Dynamic
    [Show full text]
  • Multi-Model Databases: a New Journey to Handle the Variety of Data
    0 Multi-model Databases: A New Journey to Handle the Variety of Data JIAHENG LU, Department of Computer Science, University of Helsinki IRENA HOLUBOVA´ , Department of Software Engineering, Charles University, Prague The variety of data is one of the most challenging issues for the research and practice in data management systems. The data are naturally organized in different formats and models, including structured data, semi- structured data and unstructured data. In this survey, we introduce the area of multi-model DBMSs which build a single database platform to manage multi-model data. Even though multi-model databases are a newly emerging area, in recent years we have witnessed many database systems to embrace this category. We provide a general classification and multi-dimensional comparisons for the most popular multi-model databases. This comprehensive introduction on existing approaches and open problems, from the technique and application perspective, make this survey useful for motivating new multi-model database approaches, as well as serving as a technical reference for developing multi-model database applications. CCS Concepts: Information systems ! Database design and models; Data model extensions; Semi- structured data;r Database query processing; Query languages for non-relational engines; Extraction, trans- formation and loading; Object-relational mapping facilities; Additional Key Words and Phrases: Big Data management, multi-model databases, NoSQL database man- agement systems. ACM Reference Format: Jiaheng Lu and Irena Holubova,´ 2019. Multi-model Databases: A New Journey to Handle the Variety of Data. ACM CSUR 0, 0, Article 0 ( 2019), 38 pages. DOI: http://dx.doi.org/10.1145/0000000.0000000 1.
    [Show full text]
  • IDUG EU 2008 Paul Zikopoulos: IBM Data Studio: the Toolset of The
    Session: G10 and G1 (Double session) IBM Data Studio: The Toolset of the Future Title… Here Today! Paul Zikopoulos, BA, MBA IBM Canada 15th October 2008 • (1) 13:30-4:30 (2) 14:45-15:45 Platform: Tools and Utilities 1 Presenter’s Biography Paul C. Zikopoulos, BA, MBA, is an award- winning writer and speaker with IBM. In his current role, he is the Program Director of the DB2 Technical Evangelist team. He has more than ten years of experience with DB2 UDB and has written over sixty magazine articles and several books about it. Paul has co-authored the books: DB2 9 New Features, DB2 Version 8: The Official Guide, DB2: The Complete Reference, DB2 Fundamentals Certification for Dummies, DB2 for Dummies, and A DBA's Guide to Databases on Linux. Paul is a DB2 Certified Advanced Technical Expert (DRDA and Cluster/EEE) and a DB2 Certified Solutions Expert (Business Intelligence and Database Administration). In his spare time, he enjoys all sorts of sporting activities, running with his dog Chachi, and trying to figure out the world according to Chloë – his new daughter. You can reach him at: [email protected]. 2 2 The Application Development Lifecycle The People. The Framework. The Tasks Application Database Database Administrator Developer Developer Manage Develop •Database administration •Data management •Coding •Change management •Debugging •Recovery management •Teaming •Storage management •Testing •Performance management •Tuning Design Govern •Data Modeling •Security access •Process Modeling •Security analysis •Business Modeling •Data auditing/encryption
    [Show full text]
  • Xquery Rewrite Optimization in IBM DB2 Purexml
    XQuery Rewrite Optimization in IBM R DB2 R ∗pureXMLTM Fatma Ozcan¨ Normen Seemann Ling Wang IBM Almaden Research Center IBM Silicon Valley Lab IBM Silicon Valley Lab 650 Harry Road, San Jose 555 Bailey Road, San Jose 555 Bailey Road, San Jose Abstract In this paper, we describe XQuery compilation and rewrite optimization in DB2 pureXML, a hybrid relational and XML database management system. DB2 pureXML has been designed to scale to large collections of XML data. In such a system, effective filtering of XML documents and efficient execution of XML navigation are vital for high throughput. Hence the focus of rewrite optimization is to consoli- date navigation constructs as much as possible and to pushdown comparison predicates and navigation constructs into data access to enable index usage. In this paper, we describe the new rewrite transfor- mations we have implemented specifically for XQuery and its navigational constructs. We also briefly discuss how some of the existing rewrite transformations developed for the SQL engine are extended and adapted for XQuery. 1 Introduction XML has emerged in the industry as the predominant mechanism for representing and exchanging structured and semi-structured information across the Internet, between applications, and within an intranet. Key benefits of XML are its vendor and platform independence and its high flexibility. With the proliferation of XML data, several XML management systems [7, 10, 17, 5, 4, 6, 12, 11, 14] have been developed over the last couple of years. All major database vendors have released XML extensions to their relational engines, in addition to many native XML management systems.
    [Show full text]
  • DB2 Purexml - Advanced of Data Storage in the Relational - Hierarchical Structures
    Pawel DRZYMALA, Henryk WELFLE Lodz University of Technology DB2 pureXML - advanced of data storage in the relational - hierarchical structures Abstract. The article presents pureXML technology that offers advanced features to store, process and manage XML data in its native hierarchical format. By integrating XML data and relational database structure, users can take full advantage relational data management features. PureXML technology introduced in DB2 to support XML native storage. These are the relational-hierarchical database. Paper presents XQuery and SQL/XML query language that support XML and pureXML data type. Streszczenie. W artykule przedstawiono technologię pureXML, która oferuje zaawansowane funkcje do przechowywania, przetwarzania i zarządzania danymi XML w jego oryginalnym formacie hierarchicznym. Dzięki integracji danych XML i relacyjnej struktury bazy danych, użytkownicy mogą w pełni wykorzystać ich możliwości. PureXML wprowadzono w DB2 w celu natywnego wsparcia przechowywania XML. Jest to relacyjno- hierarchiczna baza danych. Przedstawiono XQuery, SQL/XML, obsługę XML i typ danych pureXML. (DB2 pureXML - zaawansowanie składowanie danych w strukturach relacyjno-hierarchicznych). Keywords: DB2 pureXML, IBM DB2, relational-hierarchical structures. Słowa kluczowe: DB2 pureXML, IBM DB2, struktury relacyjno-hierarchiczne. doi:10.12915/pe.2014.03.36 Introduction customerinfo DB2 pureXML offers advanced features to store, process and manage XML data in its native hierarchical format. By integrating XML data and relational database structure, users can take full advantage relational data management features. PureXML, the new technology introduced in DB2 to support XML native storage. These name addr phone are the relational-hierarchical database. A data model, consisting of nodes of several types linked through ordered parent/child relationships to form a hierarchy.
    [Show full text]
  • Compare the Distributed DB2 10.5 Database Servers
    Compare the distributed DB2 10.5 database servers Amyris Rada January 2015 Senior Information Developer, DB2 for Linux, UNIX, and (First published 04 November 2013) Windows IBM Roman Melnyk, B. DB2 Information Development IBM Toronto Lab In a side-by-side comparison table, the authors make it easy to understand the basic licensing rules, functions, and feature differences among the members of the distributed DB2® 10.5 for Linux®, UNIX®, and Windows® server family as of June 14, 2013. Introduction The DB2 10.5 product family consists of six priced editions, one separately priced feature, and one no-charge package. The purpose of this article is to help you to understand the differences among them. The article also outlines the new capabilities that are available in DB2 10.5, such as BLU Acceleration, DB2 pureScale enhancements, SQL compatibility enhancements, and simplified product packaging. • DB2 with BLU Acceleration combines advanced, innovative capabilities to accelerate analytic workloads for databases and data warehouses. DB2 with BLU Acceleration also integrates with IBM Cognos® Business Intelligence to provide reporting and deeper analysis. • High availability disaster recovery (HADR) support for DB2 pureScale environments, increased availability, improved workload balancing, and database restoration from DB2 pureScale instances to regular DB2 instances are among the enhancements that are introduced in DB2 10.5. • New enhancements in SQL compatibility, such as support for large row sizes and the exclusion of NULL keys from indexes, reduce the complexity of enabling applications to run in a DB2 environment. • Changes to product packaging increase value because more capabilities and features, such as warehouse functionality, are included in the base DB2 editions.
    [Show full text]
  • IDUG NA 2009 Paul Zikopoulos: IBM Data Studio: the Toolset of The
    Session: E05 IBM Data Studio: The Toolset of the Future... Here Today (A Tour of the Free Stuff) Paul Zikopoulos, BA, MBA IBM Canada May 13th, 2009 • 9:45 a.m. – 10:45 a.m. Platform: All Speaker Bio Paul C. Zikopoulos, BA, MBA is the Program Director for the DB2 Evangelist team at IBM. He is an award-winning writer and speaker with more than 14 years of experience with DB2. Paul has written more than 230 magazine articles and 11 books on DB2 including, Information on Demand: Introduction to DB2 9.5 New Features, DB2 9 Database Administration Certification Guide and Reference (6th Edition), DB2 9: New Features, Information on Demand: Introduction to DB2 9 New Features, Off to the Races with Apache Derby, DB2 Version 8: The Official Guide, DB2: The Complete Reference, DB2 Fundamentals Certification for Dummies, DB2 for Dummies, and A DBA's Guide to Databases on Linux. Paul is a DB2 Certified Advanced Technical Expert (DRDA and Clusters) and a DB2 Certified Solutions Expert (BI and DBA). In his spare time, he enjoys all sorts of sporting activities, including running with his dog Chachi, avoiding punches in his MMA training, and trying to figure out the world according to Chloë – his daughter. You can reach him at: [email protected]. 2 The People Cost We’re Not Getting Marginally Cheaper 3 The Application Development Lifecycle Maintaining Alignment is About COMMUNICATION 4 Integrated Data Management Maximizing value from requirements to retirement Mitigate risk of data breach with Improve data quality and software database encryption governance
    [Show full text]