VLDB - an Analysis of DB2 at Very Large Scale - D13

Total Page:16

File Type:pdf, Size:1020Kb

VLDB - an Analysis of DB2 at Very Large Scale - D13 VLDB - An Analysis of DB2 at Very Large Scale - D13 Austin Clifford IBM Session Code: 2130 Fri, May 18, 2012 (08:00 AM - 09:00 AM) | Platform: DB2 for LUW - II Presentation Objectives 1) Design & implementation of a VLDB. 2) Benefits and best practice use of DB2 Warehouse features. 3) Ingesting data into VLDB. 4) Approach & considerations to scaling out VLDB as the system grows. 5) Management and problem diagnosis of a VLDB. Disclaimer ●© Copyright IBM Corporation 2012. All rights reserved. ●U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. ●THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE. •IBM, the IBM logo, ibm.com, and DB2 are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml 4 What is a Very Large Database? A very large database, or VLDB, is a database that contains an extremely high number of tuples (database rows), or occupies an extremely large physical filesystem storage space. The most common definition of VLDB is a database that occupies more than 1 terabyte. 5 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB ● Conclusion 6 VLDB Mission ● Increasing demands from real-time analytics is placing additional pressure on warehouse systems....... ● Demonstrate the performance and scalability of DB2 and its complimentary products at the Petabyte scale. ● Simulate heavy on-peak analytics in parallel with other essential system functions such as data ingest and backup and recovery. ● Guide best practices and future product direction. ● Develop techniques for massive scale rapid data generation. 7 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB ● Conclusion 8 Digital Data 101 – What is a Petabyte? ● 1 Bit = Binary Digit ● 8 Bits = 1 Byte ● 1024 Bytes = 1 Kilobyte ● 1024 Kilobytes = 1 Megabyte ● 1024 Megabytes = 1 Gigabyte ● 1024 Gigabytes = 1 Terabyte ● 1024 Terabytes = 1 Petabyte ● 1024 Petabytes = 1 Exabyte ● 1024 Exabytes = 1 Zettabyte ● 1024 Zettabytes = 1 Yottabyte ● 1024 Yottabytes = 1 Brontobyte ● 1024 Brontobytes = 1 Geopbyte 9 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB ● Conclusion 10 The Building Blocks We Start with the Storage: 1x = 450GB 1PB of DB Data = Raw Data + RAID + Contingency = 1.6PB 4,608 x = 1.6PB 11 The Building Blocks ● Disks get housed in EXP5000 enclosures ● EXP5000 can hold 16 disks 4608/16 = x 288 ● EXP5000 need a DS5300 Storage controller to manage the IO activity (1 DS for 18 EXP) x 288 = x 16 12 The Building Blocks ●That's the storage done – now we need to drive the system with servers. ●16 clusters ●Smart Analytics guideline of 4 p550 Servers per cluster ●Each cluster attached to 1 DS5300 ●64 servers total = 4 x 13 The Building Blocks ●The communication between devices ● Juniper Network switches for the copper networks ● IBM SAN switches for the fiber networks ●The server control for the 64 servers is managed by the HMC (Hardware Maintenance Console) 14 Expansion Groups P550 x 4 EXP5000 x 6 EXP5000 x 12 DS5300 x 1 15 Hardware Summary Full VLDB deployment: ● Smart Analytics like configuration ● 64 p550 Servers ● 16 DS5300 Storage Controllers ● 288 EXP5000 Disk Enclosures ● 4,608 Disks (450GB each -> 1.6PB) ● 8 IBM SAN switches (24p/40p) ● 7 Juniper Network switches (48p) ● 2 HMCs ● 6KM of copper cables ● 2KM of fiber cables ● Occupies 33 fully loaded racks ● Latest ‘Free cooling” designs are incorporated into the lab 16 Free Cooling ● 6 CRAC (Computer Room Air Con) units in the VLDB lab ● Ireland's favourable (?) climate ● Significant savings for Computer room cooling ● As long as outside air temp is below 9.5 degrees C, 100% of the cooling of the room is by fresh air ● Over a full year, 80% of the cooling is fresh air provisioned 17 Software Stack DB2 (Server 9.7 Fix Pack 5) IBM General Parallel File System (GPFS™ ) 3.3.0.14 IBM Tivoli System Automation for Multi-Platforms 3.1.1.3 IBM AIX 6.1 TL6 SP5 IBM DS Storage Manager 10.60.G5.16. 18 VLDB in the flesh 19 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB ● Conclusion 20 Shared Nothing Architectureselect … from table Tables Fast Communication Manager Engine Engine Engine Engine … data+log data+log data+log data+log Partition 1 Partition 2 Partition 3 Partition n Database ● Partitioned Database Model ● Database is divided into 504 partitions ● Partitions run on 63 physical nodes (8 partitions per host) ● Each Partition Server has dedicated resources ● Parallel Processing occurs on all partitions: coordinated by DB2 ● Single system image to user and application 21 Shared Nothing Architecture ● Hash Partitioning ● Provides the best parallelism and maximizes I/O capability ● VLDB management (recovery, maintenance, etc.) ● Large scans automatically run in parallel... ● All nodes work together ● Truly scalable performance ● 504 partitions will complete the job in 1/504th of the time ● Queries and Utilities too (backup/restore, load, index build etc) 22 Mapping DB2 Partitions to Servers FCM FCM FCM FCM part0 part1 part2 part3 Node 1 Node 2 # db2nodes.cfg •DB2 instance configuration file # sqllib/db2nodes.cfg •All databases in the instance share this # definition •File stored in DB2 instance sqllib directory 0 node1 0 and shared to other nodes via GPFS 1 node1 1 2 node2 0 • Specifies the host name or the IP address 3 node2 1 of the high speed interconnect for FCM communication ........ 23 Agenda ● VLDB Mission ● What is a PetaByte? ● Building a PetaByte System ● Shared Nothing Architecture ● Database Design ● Data Generation & ETL ● Workload & Testing ● Performance Monitoring ● Expanding the System ● Useful Tips for VLDB ● Conclusion 24 Logical Design Larger dimension tables are Star and snowflake often snow-flaked Sampled production database artifacts Fact tables - composite PK Dimensions tables have or non-unique PK surrogate PKs Dimension FKs are indexed VLDB: Relationships inferred All tables are compressed 25 Sample Schema ● Star schema for 4 largest fact tables 26 Partition Group Design ● Partition Groups ● Small dimension tables in Single Database Partition Group (SDPG). ● Fact and large dimension tables are partitioned. PG_DISJOINT1 PG_DISJOINT2 ● Collocation of Facts and largest/frequently joined dimension. ● VLDB - disjoint partition groups used to drive FCM, Table Queues harder. 27 Choosing the Partitioning Key ● Partitioning key ● A subset of the primary key ● DISTRIBUTE BY HASH ● Fewer columns is better ● Surrogate key with high ● cardinality is ideal Collocation ● Possible for tables with same partitioning key ● Collocate Fact with largest commonly joined dimension table ● Consider adding redundant column to Fact PK ● Replicate other dimension tables ● Trade-off between partition balancing and optimal collocation ● Skew ● Avoid skew of more than 10% ● Avoid straggler partition. 28 Check Skew -- rows per partition SELECT dbpartitionnum(date_id) as ‘Partition number’, count(1)*10 as ‘Total # records’ FROM bi_schema.tb_sales_fact TABLESAMPLE SYSTEM 10 GROUP BY dbpartitionnum(date_id) Partition number Total # records ------------------------ 1 10,313,750 2 10,126,900 3 9,984,910 4 10,215,840 -- Space allocation per partition Select DBPARTITIONNUM, SUM(DATA_OBJECT_L_SIZE) SIZE_KB from SYSIBMADM.ADMINTABINFO where (tabschema,tabname) = ('THESCHEMA','THETABLE') group by rollup( DBPARTITIONNUM ) order by 2; 29 Physical Design ● Separate tablespaces for: ● Staging Tables ● Indexes ● MQTs ● Table data ● VLDB - typically larger tables have larger pagesize ● Range Partitioning ● Most Fact and large
Recommended publications
  • Oracle Database VLDB and Partitioning Guide, 11G Release 2 (11.2) E25523-01
    Oracle® Database VLDB and Partitioning Guide 11g Release 2 (11.2) E25523-01 September 2011 Oracle Database VLDB and Partitioning Guide, 11g Release 2 (11.2) E25523-01 Copyright © 2008, 2011, Oracle and/or its affiliates. All rights reserved. Contributors: Hermann Baer, Eric Belden, Jean-Pierre Dijcks, Steve Fogel, Lilian Hobbs, Paul Lane, Sue K. Lee, Diana Lorentz, Valarie Moore, Tony Morales, Mark Van de Wiel This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007).
    [Show full text]
  • Data Warehouse Fundamentals for Storage Professionals – What You Need to Know EMC Proven Professional Knowledge Sharing 2011
    Data Warehouse Fundamentals for Storage Professionals – What You Need To Know EMC Proven Professional Knowledge Sharing 2011 Bruce Yellin Advisory Technology Consultant EMC Corporation [email protected] Table of Contents Introduction ................................................................................................................................ 3 Data Warehouse Background .................................................................................................... 4 What Is a Data Warehouse? ................................................................................................... 4 Data Mart Defined .................................................................................................................. 8 Schemas and Data Models ..................................................................................................... 9 Data Warehouse Design – Top Down or Bottom Up? ............................................................10 Extract, Transformation and Loading (ETL) ...........................................................................11 Why You Build a Data Warehouse: Business Intelligence .....................................................13 Technology to the Rescue?.......................................................................................................19 RASP - Reliability, Availability, Scalability and Performance ..................................................20 Data Warehouse Backups .....................................................................................................26
    [Show full text]
  • Study Material for B.Sc.Cs Dataware Housing and Mining Semester - Vi, Academic Year 2020-21
    STUDY MATERIAL FOR B.SC.CS DATAWARE HOUSING AND MINING SEMESTER - VI, ACADEMIC YEAR 2020-21 UNIT CONTENT PAGE Nr I DATA WARE HOUSING 03 II BUSINESS ANALYSIS 10 III DATA MINING 18 IV ASSOCIATION RULE MINING AND CLASSIFICATION 35 V CLUSTER ANALYSIS 53 Page 1 of 66 STUDY MATERIAL FOR B.SC.CS DATAWARE HOUSING AND MINING SEMESTER - VI, ACADEMIC YEAR 2020-21 UNIT I: DATA WAREHOUSING Data warehousing Components: ->Overall Architecture Data warehouse architecture is Based on a relational database management system server that functions as the central repository (a central location in which data is stored and managed) for informational data In the data warehouse architecture, operational data and processing is separate and data warehouse processing is separate. Central information repository is surrounded by a number of key components. These key components are designed to make the entire environment- (i) functional, (ii) manageable and (iii) accessible by both the operational systems that source data into warehouse by end-user query and analysis tools. Page 2 of 66 STUDY MATERIAL FOR B.SC.CS DATAWARE HOUSING AND MINING SEMESTER - VI, ACADEMIC YEAR 2020-21 The source data for the warehouse comes from the operational applications As data enters the data warehouse, it is transformed into an integrated structure and format The transformation process may involve conversion, summarization, filtering, and condensation of data Because data within the data warehouse contains a large historical component the data warehouse must b capable of holding and managing large volumes of data and different data structures for the same database over time. ->Data Warehouse Database Central data warehouse database is a foundation for data warehousing environment.
    [Show full text]
  • Public 1 Agenda
    © 2013 SAP AG. All rights reserved. Public 1 Agenda Welcome Agenda • Introduction to Dobler Consulting • SAP IQ Roadmap – What to Expect • Q&A Presenters • Courtney Claussen - SAP IQ Product Management • Peter Dobler - CEO Dobler Consulting Closing © 2013 SAP AG. All rights reserved. Public 2 Introduction to Dobler Consulting Dobler Consulting is a leading information technology and database services company, offering a broad spectrum of services to their clients; acting as your Trusted Adviser, Provide License Sales, Architectural Review and Design Consulting, Optimization Services, High Availability review and enablement, Training and Cross Training, and lastly ongoing support and preventative maintenance. Founded in 2000, the Tampa consulting firm specializes in SAP/Sybase, Microsoft SQL Server, and Oracle. Visit us online at www.doblerconsulting.com, or contact us at 813 322 3240, or [email protected]. © 2013 SAP AG. All rights reserved. Public 3 Your Data is Your DNA, Dobler Consulting Focus Areas Strategic Database Consulting SAP VAR D&T DBA Database Managed Training Services Programs Cross-Platform Expertise SAP Sybase® SQL Server® Oracle® © 2013 SAP AG. All rights reserved. Public 4 What’s Ahead ISUG-TECH Annual Conference April 14-17, Atlanta • Register at http://my.isug.com/conference/registration • Early Bird ending 2/28/14 (free hotel room with registration) SAPPHIRENOW Annual Conference June 3-5, Orlando • Come visit our kiosk in the exhibition hall © 2013 SAP AG. All rights reserved. Public 5 SAP IQ Roadmap Dobler Events Webinar Courtney Claussen / SAP IQ Product Management February 27, 2014 Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase decision.
    [Show full text]
  • Database Machines in Support of Very Large Databases
    Rochester Institute of Technology RIT Scholar Works Theses 1-1-1988 Database machines in support of very large databases Mary Ann Kuntz Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Kuntz, Mary Ann, "Database machines in support of very large databases" (1988). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Rochester Institute of Technology School of Computer Science Database Machines in Support of Very large Databases by Mary Ann Kuntz A thesis. submitted to The Faculty of the School of Computer Science. in partial fulfillment of the requirements for the degree of Master of Science in Computer Systems Management Approved by: Professor Henry A. Etlinger Professor Peter G. Anderson A thesis. submitted to The Faculty of the School of Computer Science. in partial fulfillment of the requirements for the degree of Master of Science in Computer Systems Management Approved by: Professor Henry A. Etlinger Professor Peter G. Anderson Professor Jeffrey Lasky Title of Thesis: Database Machines In Support of Very Large Databases I Mary Ann Kuntz hereby deny permission to reproduce my thesis in whole or in part. Date: October 14, 1988 Mary Ann Kuntz Abstract Software database management systems were developed in response to the needs of early data processing applications. Database machine research developed as a result of certain performance deficiencies of these software systems.
    [Show full text]
  • Requirements for XML Document Database Systems Airi Salminen Frank Wm
    Requirements for XML Document Database Systems Airi Salminen Frank Wm. Tompa Dept. of Computer Science and Information Systems Department of Computer Science University of Jyväskylä University of Waterloo Jyväskylä, Finland Waterloo, ON, Canada +358-14-2603031 +1-519-888-4567 ext. 4675 [email protected] [email protected] ABSTRACT On the other hand, XML will also be used in ways SGML and The shift from SGML to XML has created new demands for HTML were not, most notably as the data exchange format managing structured documents. Many XML documents will be between different applications. As was the situation with transient representations for the purpose of data exchange dynamically created HTML documents, in the new areas there is between different types of applications, but there will also be a not necessarily a need for persistent storage of XML documents. need for effective means to manage persistent XML data as a Often, however, document storage and the capability to present database. In this paper we explore requirements for an XML documents to a human reader as they are or were transmitted is database management system. The purpose of the paper is not to important to preserve the communications among different parties suggest a single type of system covering all necessary features. in the form understood and agreed to by them. Instead the purpose is to initiate discussion of the requirements Effective means for the management of persistent XML data as a arising from document collections, to offer a context in which to database are needed. We define an XML document database (or evaluate current and future solutions, and to encourage the more generally an XML database, since every XML database development of proper models and systems for XML database must manage documents) to be a collection of XML documents management.
    [Show full text]
  • A Methodology for Evaluating Relational and Nosql Databases for Small-Scale Storage and Retrieval
    Air Force Institute of Technology AFIT Scholar Theses and Dissertations Student Graduate Works 9-1-2018 A Methodology for Evaluating Relational and NoSQL Databases for Small-Scale Storage and Retrieval Ryan D. Engle Follow this and additional works at: https://scholar.afit.edu/etd Part of the Databases and Information Systems Commons Recommended Citation Engle, Ryan D., "A Methodology for Evaluating Relational and NoSQL Databases for Small-Scale Storage and Retrieval" (2018). Theses and Dissertations. 1947. https://scholar.afit.edu/etd/1947 This Dissertation is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact [email protected]. A METHODOLOGY FOR EVALUATING RELATIONAL AND NOSQL DATABASES FOR SMALL-SCALE STORAGE AND RETRIEVAL DISSERTATION Ryan D. L. Engle, Major, USAF AFIT-ENV-DS-18-S-047 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. AFIT-ENV-DS-18-S-047 The views expressed in this paper are those of the author and do not reflect official policy or position of the United States Air Force, Department of Defense, or the U.S. Government. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. i AFIT-ENV-DS-18-S-047 A METHODOLOGY FOR EVALUATING RELATIONAL AND NOSQL DATABASES FOR SMALL-SCALE STORAGE AND RETRIEVAL DISSERTATION Presented to the Faculty Department of Systems and Engineering Management Graduate School of Engineering and Management Air Force Institute of Technology Air University Air Education and Training Command In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Ryan D.
    [Show full text]
  • VLDB Prerequisite for the Success of Digital India 02 Content
    VLDB Prerequisite for the success of Digital India 02 Content Foreword 05 Introduction to Very Large Database 06 Adoption of VLDB 07 Overview of Digital India Programme 10 How VLDB Can Enable Digital India Programme 12 Key VLDB Challenges and Solutions 13 Conclusion 20 References 21 Contacts 21 03 VLDB | Prerequisite for the success of Digital India 04 VLDB | Prerequisite for the success of Digital India Foreword A few decades ago, data was considered a byproduct To tap ongoing momentum of digitizing India, there is a of algorithms or processes, not quite an integral great need to develop an atmosphere of impregnable part. But as the algorithms started being used for association between government, industry and businesses, it was realized that data generated is common man. A new kind of professional has not just a byproduct, rather an essential part of the emerged, the data scientist, who possesses the skills of process. Personal desktops also began using client- software programmer, statistician and artist to extract server databases regularly. Two decades later, we see the data. With time, the data generated and processed databases being involved in activities we perform on a will further increase and new solutions will have to be daily basis. The presence of the “Industrial Revolution devised, but this first step is essential in ensuring that of Data” is being felt all over the world, from science the whole country moves towards digitization as one. to the arts, from business to government. Digital information increases tenfold every five years that The purpose of this report is to promote discussions results in a vast amount of data being shared.
    [Show full text]
  • Big Data Query Optimization -Literature Survey
    Big Data Query Optimization -Literature Survey Anuja S. ( [email protected] ) SRM Institute of Science and Technology Malathy C. SRM Institute of Science and Technology Research Keywords: Big data, Parallelism, optimization, hadoop, and map reduce. Posted Date: July 12th, 2021 DOI: https://doi.org/10.21203/rs.3.rs-655386/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Page 1/9 Abstract In today's world, most of the private and public sector organizations deal with massive amounts of raw data, which includes information and knowledge in their secret layer. In addition, the format, scale, variety, and velocity of generated data make it more dicult to use the algorithms in an ecient manner. This complexity necessitates the use of sophisticated methods, strategies, and algorithms to solve the challenges of managing raw data. Big data query optimization (BDQO) requires businesses to dene, diagnose, forecast, prescribe, and cognize hidden growth opportunities and guiding them toward achieving market value. BDQO uses advanced analytical methods to extract information from an increasingly growing volume of data, resulting in a reduction in the diculty of the decision-making process. Hadoop, Apache Hive, No SQL, Map Reduce, and HPCC are the technologies used in big data applications to manage large data. It is less costly to consume data for query processing because big data provides scalability. However, small businesses will never be able to query large databases. Joining tables with millions of tuples could take hours. Parallelism, which solves the problem by using more processors, may be a potential solution.
    [Show full text]
  • Oracle Database 12C Release 2 for Data Warehousing and Big Data
    Oracle Database 12c Release 2 for Data Warehousing and Big Data ORACLE WHITE PAPER | NOVEMBER 2016 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 1 | ORACLE DATABASE 12C RELEASE 2 FOR DATA WAREHOUSING AND BIG DATA Introduction 3 Big Data: The Evolution of Data Warehousing 4 Oracle Database 12c Release 2 and Oracle Exadata: A Data Warehouse as a Foundation for Big Data 5 Exadata 5 Oracle Database In-Memory 6 Query Performance 6 Data Management 7 Partitioning 7 Compression 8 Read Consistency and Online operations 8 Analytics and SQL 9 SQL Extensions for Analytics 9 Approximate Query Processing 9 SQL Extensions for Non-relational Data 10 Analytic Views 10 Advanced Analytics 10 OLAP 10 Conclusion 11 Introduction What is a data warehouse? Quite simply, a data warehouse is a database built for the purposes of analysis. “Data warehouses” encompass a huge range of applications today, from large-scale advanced analytical data stores supporting dozens of sophisticated analysts to pre-built business intelligence applications with tens of thousands of users, and from enterprise-wide data warehouses to departmental data marts. Data warehouses are now a mainstay of the IT infrastructure, enabling both long-term strategic planning and agile responses to new market conditions.
    [Show full text]
  • Very Large Databases: Challenges and Strategies
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072 Very Large Databases: Challenges and Strategies Ogbonna J. C.1, Nwokoma F. O. 2, Alala A. G. 3 1Reseacher, Dept. of Computer Science, Clifford University Owerrinta, Abia State Nigeria 2Reseacher, Dept. of Computer Science, Federal University of Technology Owerri, Imo State Nigeria 3Reseacher, Dept. of Library and Information Science, Clifford University Owerrinta, Abia State Nigeria ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - The world of IT has grown to the point that the 1.1 Very Large Database Challenges rate at which data is being generated calls for appropriate techniques and strategies to handle such large data set called The Very Large Database challenges include the following: Big Data through an appropriate database management There is a steady growth in the size of the database. system beyond the traditional DBMS. As at 30 years ago, data of size 100GB could have been regarded as a very large data, There is no minimum absolute size for a very large but in today’s environment, a new database deployment in a database. large IT industry might start at 1TB meaning that 100GB may It is not cost effective to perform operations against be regarded today as a small database size. The transition a system of such size. from desktop computing to mobile computing also has increase the rate of the usage of mobile devices since these What are the best ways to capture, manage, backup devices are mobile, cheap, easily purchased and easily and recover data in a very large database systems? programmed, though having relatively low processing 1.2 Trends Responsible for the Steady Growth in capabilities.
    [Show full text]
  • 데이터 운영비용 효율화를 위한 SAP 데이터베이스 플랫폼 - ASE, IQ - Ⅰ Ⅱ Ⅲ Ⅳ Ⅰ Adaptive Server Platform Edition 올바른 제품의 선택 Transactions – Reporting – Analysis – Availability - DR
    데이터 운영비용 효율화를 위한 SAP 데이터베이스 플랫폼 - ASE, IQ - Ⅰ Ⅱ Ⅲ Ⅳ Ⅰ Adaptive Server Platform Edition 올바른 제품의 선택 Transactions – Reporting – Analysis – Availability - DR Adaptive Server Platform Edition SAP ASE SAP IQ SAP Replication Server • OLTP/XOLTP 트랜잭션 처리 • 칼럼기반의 분석과 리포팅 업무에 • LAN/WAN 구간에서 부하분산, 데이 • 선형적인 성능의 향상 & 동시사용 적합한 DW 전용의 RDBMS 터 분할, 리포트, HA, DR 을 위한 복 자 환경 지원 • Gartner Inc. and Forrester Group선 제 • TB 데이터의 수백만 트랜잭션 처리 정한 최적의 EDW DBMS • 데이터 소스 시스템 성능에 영향을 지원 • 획기적인 스토리지 압축 주지 않는 로그 수집기반의 데이터 • SAP Bus Suite에 최적화 인증 • 전통적인 RDBMS에 비해 월등한 성 복제 • 진보된 보안, 암호화, 감사 기능제 능 • 이 기종의 DB혹은 이 기종의 OS에 공 • 그리드 구조의 수평확장성과 동시 탑재되어있는 DB로 데이터 전환 • HANA 가속기 사용자 환경과 초 병렬 분석 프로 • SAP ASE, SAP IQ, SAP HANA 의 데이 • SAP SD benchmark 에서 #1 성능 세싱 지원 터 복제는 3rd party DBMS vendors와 • 다양한 인덱스 기술과 유지보수 필 는 다른 native 복제지원 요 없는 인덱싱, 데이터 관리 업무환경이 변하거나 적절한 솔루션을 선정하지 않았다면? 4 Adaptive Server Platform Edition SAP Adaptive Server Platform Edition Perpetual Right To Deploy SAP ASE including Replication Server SAP IQ including Security and source target Advanced Directory Services or ASE ASE or Security Partitions HANA HANA Compression IQ VLDB Mgt Data Services Encryption Hadoop • Adaptive Server Platform Edition 제품들의 어떤 조합이든 사용가능 • 어떤 ASPE 제품들 간에는 언제든 라이선스 재 배포가 가능 • 사용자가 원하는 형태로 사용가능 추가 비용이 필요 없는 선택기능 • 각 ASPE 라이선스에는 추가적인 선택기능을 사용할 수 있는 권한을 갖는다 5 Adaptive Server Platform Edition SAP Adaptive Server Platform Edition 사용자들에게 제공합니다 훨씬 뛰어난 IT 유연성 훨씬 뛰어난 신속성 훨씬 낮은 비용으로 더 다양한 업무를 효율적으로 수행 • 라이선스를 재 활용 • 피크 시간과 새로운 비즈니스 • 이전에는 비용을 지불해야 했던 요구와 같이 시장 상황에 선택기능 다수 포함 • 기존 자산을 Platform Edition 제 맞추어 구성을 바꿀 수 있다 품으로 전환 사용 • 미래 프로젝트를 위한 비용 절 • 다양한 업무시나리오 구현 감 • HANA와 연계한 활용 6 Ⅱ Adaptive Server Enterprise SAP Database Solutions SAP SAP는 목표달성을 위한 자산을 보유 채널, 기술 과 고객들 SAP (applications and infrastructure) SAP BOBJ Ecosystem SAP (databases, mobility, and FSI) 최적화된 DB 제품들 가장 빠르게 성장하는 HANA SAP ASE 데이터베이스 회사 SAP IQ in 2015! SQL Anywhere 8 ASE 역사 9 SAP Adaptive Server Enterprise 기업들에게 강한 신뢰를 받은 데이터 플랫폼 Gartner Magic Quadrant for Operational Database Management Systems 2014 Oct.
    [Show full text]