Analysis of Mpp-Systems Stress Testing Based on Big Data

Total Page:16

File Type:pdf, Size:1020Kb

Analysis of Mpp-Systems Stress Testing Based on Big Data International Journal of Advances in Electronics and Computer Science, ISSN: 2393-2835 Volume-4, Issue-2, Feb.-2017 http://iraj.in ANALYSIS OF MPP-SYSTEMS STRESS TESTING BASED ON BIG DATA 1VASSILIY SERBIN, 2YEVGENIY GORBUNOV 1,2Computer science, 2 Information systems department International Information Technology University E-mail: [email protected],[email protected] Abstract- The comparative analysis of MPP-systems (Massively parallel processing) stress testing based on Big Data on different software and hardware systems is essential for high-performance computing, both in terms of optimizing the load on existing machines, and from the point of view of new platform procurement policy. The objectives are 14 major marketing SAS campaigns of one of the Kazakhstan Banks. They were selected to assess adequately capabilities of computer systems. Additionally, with the help of specialized tools, the authors analyzed the characteristics of the studied systems of massively parallel architectures. Keywords- Greenplum, Nettezza, Exadata, SAS-campaign, Big Data, high-performance computing, stress testing, MPP- systems. I. INTRODUCTION about the expenses and profits of undertaking new Big Data ventures (Vivekananda Gopalkrishnan et al., Big Data are important in many various areas, such as [5]). science, social media, enterprise etc. Banking is the Although storage capabilities have significantly industry where Big Data technologies can be widely grown and data stores are available around the world, applied. Financial institutions have large data arrays, it is still very hard to capture and store big data operate in a highly competitive environment and efficiently and make it easily accessible (Jameela Al- possess enough budget for IT innovation. Jaroodi, et al., [6]). Michael Stonebraker [1] claims that many enterprises Big Data is able to solve almost all of the key tasks of face with integrating a larger and larger number of banks such as customer acquisition, improving data sources with diverse data (spreadsheets, Web service quality, assessment of borrowers, combating sources, XML, traditional DBMSs). Informatica and fraud, etc. By increasing the speed and quality of Cloudera leaders consider as a solution the fact that reporting, increasing analysis depth, participating in the Apache Hadoop data management platform is, in the laundering of illicit funds, Big Data technologies many ways, uniquely equipped to handle the volume, help banks to meet the requirements of financial variety and velocity of unstructured data being regulators. generated within many businesses. Data collected in banks are unstructured by their Wei Fan and Albert Bifet [2] stated that “Large nature. Unstructured data are the fastest growing type quantities of useful data are getting lost since new of data generated today. Experts estimate that 80 to data is largely untagged file based and unstructured 90 % of data in any organization is unstructured [7, data. The 2012 IDC study on Big Data explains that 8]. Big Data paradigm allows to solve the problem of in 2012, 23% (643 exabytes) of the digital universe processing unstructured data. Unstructured data like would be useful for Big Data if tagged and analyzed. text, data or numeric values are not in a defined However, currently only 3% of the potentially useful schema. To make some sense out of unstructured data data is tagged, and even less is analyzed”. some sort of framework needs to be overlaid on the According to AnHai Doan, et. al. [3], “Unstructured raw data to make it more like information. This is the data has now permeated numerous real-world reason that Hadoop and similar tools are needed to applications, in all domains. Consequently, managing provide some structure using key value pairs to create such data is now an increasingly critical task, not just some structure where there is no structure. to our community, but also to many others, such as Methods for working with data are not improved at a the Web, AI, KDD, and SIGIR communities”. rate, with which their volumes grow, and they are not such a product of technology as the results of The problem is not only to store and manage Big scientific work, however, emergence of big data Data, but also to process and retrieve useful problems markedly speeded the trend of research. information from it (Kapil Bakshi [4]). Today the need to work with unstructured data has Many organizations have progressively come to become urgent. understanding of significance of information as a vital asset. As most customers are getting acquainted The goal of the research is to analyze stress testing with Big Data concept, a lot of people are still unsure MPP-systems based on Big Data for Greenplum, Analysis of MPP-Systems Stress Testing Based on Big Data 59 International Journal of Advances in Electronics and Computer Science, ISSN: 2393-2835 Volume-4, Issue-2, Feb.-2017 http://iraj.in Netezza, Exadata based on Big Data SAS Campaign Exadata is a line of software and hardware systems Management. that was being released commercially by Oracle Corporation. From 2008 to mid-2009 it was based on II. METHODS AND MATERIALS Hewlett-Packard server hardware, and later on the hardware from the absorbed Sun Microsystems. The Massively parallel processing (MPP) [9] is a class of complex is a cluster of database management servers, parallel computing system architectures. A distinctive based on Oracle RAC technology, delivered as pre- feature of the architecture is that memory is assembled telecommunication closets of 42 unit physically divided. The system is built from the dimensions filled with servers, node storage, and individual nodes, comprising a processor, a local switches, InfiniBand or Ethernet [12]. memory bank, communication processors and network adapters, in some cases hard drives and other Competitors also noted that being focused on OLTP, input-output devices. Only processors of the same and OLAP-processing simultaneously, makes the node have access to the random access memory bank systems less effective for analytical processing, on of the unit. The nodes are connected by special which similar solutions from Teradata and Netezza communication channels. A user can define a logical are concentrated, in particular, non-optimality of number of the processor to which he or she is usage of the approach with the symmetrical access connected, and organize the exchange of messages from all servers to all storage nodes (symmetrical with other processors. Two modes of operation for parallelism) as opposed to the complete separation of the operating system are used on the machines of data between nodes in competing analytical systems massively parallel architecture: with massively parallel processing is noted [13]. - In one mode, the complete operating system runs only on the management machine (front-end), and III. ANALYSIS ON STRESS TESTING IN TEST each node runs a heavily abridged version of the CAMPAIGNS operating system that supports the operation of the branch of the parallel application allocated to it. In this research, stress testing of Greenplum, Nettezza, Exadata, Oracle systems at tests of - In the second mode, each module operates fully, campaigns is analyzed. Testing was held by start of most often a UNIX-like system which is installed "packs" on 7 campaigns and represented start of sets separately. of test campaigns ("packs"): Large companies’ solutions such as EMC Greenplum, IBM Netezza and Oracle Exadata were selected as the - the first pack ("odd") is the start of 7 test campaigns objective of the study. in a parallel. Campaigns started at the same time in the Execute mode with previously chosen option Greenplum Software is a company that is engaged in "Test". Then all 7 campaigns are expected to be development of a DBMS for data warehouses. The completed. company specializes in Enterprise Data Cloud solutions for large-scale data warehousing and - the second pack ("even") is the start of the remained analytical systems. Greenplum Database DBMS is 7 test campaigns in a parallel. A way of start of based on a modified PostgreSQL database with campaigns is similar to the first round. massively parallel processing (MPP). Greenplum implemented MapReduce functionality and Column- Production schedules on the systems used in the Oriented Organization of tables in their database as analysis are given below. Schedules contain identical part of the so-called Polymorphic Data Storage temporary scales across that at an opportunity to technology [10]. compare schedules with each other. Statistics at tests I gathered on servers with 30 second intervals. Netezza is an American company, the developer of Schedule of Running Task "Task" is the settlement software and hardware data storage - relational task started in system: either SQL inquiry, or the database server clusters, providing massively parallel stored process of SAS (Fig.1, Fig.2, Fig.3). processing. A distinctive feature of all Netezza complexes is the use of programmable gate arrays in the data processing nodes, providing compression and filtering of data, and thus, it allows to reduce the costs of storage and input-output operations during execution of data selection queries. The company was founded in 2000; in 2010 it was absorbed by IBM Corporation, and in 2011 was fully integrated into the corporation, hardware and software systems are being released under the name IBM PureData for Analytics since 2012 [11]. Fig.1. Graphic Running Task. Netezza Analysis of MPP-Systems Stress Testing Based on Big Data 60 International Journal of Advances in Electronics and Computer Science, ISSN: 2393-2835 Volume-4, Issue-2, Feb.-2017 http://iraj.in Fig. 5. Greenplum Segment Node The schedule of Exadata Storage Sell Node is constructed using the means of NMON Analyser, on the basis of the data collected by the utility of nmon on one of Storage of servers and on one of Database of servers (Fig. 6). Fig.2. Graphic Running Task. Greenplum Fig.6. Exadata Storage Sell Node The schedule of Netezza CPU Aggregate is constructed on data of Excel monitoring utility.
Recommended publications
  • Aware, Workstation-Based Distributed Database System
    THE ARCHITECTURE OF AN AUTONOMIC, RESOURCE- AWARE, WORKSTATION-BASED DISTRIBUTED DATABASE SYSTEM Angus Macdonald PhD Thesis February 2012 Abstract Distributed software systems that are designed to run over workstation machines within organisations are termed workstation-based. Workstation-based systems are characterised by dynamically changing sets of machines that are used primarily for other, user-centric tasks. They must be able to adapt to and utilize spare capacity when and where it is available, and ensure that the non-availability of an individual machine does not affect the availability of the system. This thesis focuses on the requirements and design of a workstation-based database system, which is motivated by an analysis of existing database architectures that are typically run over static, specially provisioned sets of machines. A typical clustered database system — one that is run over a number of specially provisioned machines — executes queries interactively, returning a synchronous response to applications, with its data made durable and resilient to the failure of machines. There are no existing workstation-based databases. Furthermore, other workstation-based systems do not attempt to achieve the requirements of interactivity and durability, because they are typically used to execute asynchronous batch processing jobs that tolerate data loss — results can be re-computed. These systems use external servers to store the final results of computations rather than workstation machines. This thesis describes the design and implementation of a workstation-based database system and investigates its viability by evaluating its performance against existing clustered database systems and testing its availability during machine failures. ACKNOWLEDGEMENTS I’d like to thank my supervisors, Professor Alan Dearle and Dr Graham Kirby, for the opportunities, the support, and the education that they have given me.
    [Show full text]
  • Providing High Availability for SAP Resources with Oracle Clusterware 11 Release 2
    Providing High Availability for SAP Resources with Oracle Clusterware 11 Release 2 An Oracle White Paper September 2011 Document Version 6.0 Providing High Availability for SAP Resources Overview of High Availability for SAP Resources.................................................... 3 New Functionality...................................................................................................... 3 SAP Support for High Availability ............................................................................ 4 Installation and Management ..................................................................................... 7 Overview Of Installation and Configuration.............................................................. 8 Functionality............................................................................................................. 13 Conclusion................................................................................................................ 16 Worked Example...................................................................................................... 17 Appendix 1 – Sample profile scripts........................................................................ 22 Appendix 2 – Troubleshooting and Log Files.......................................................... 24 Appendix 2 - SAPCTL Bill of Materials.................................................................. 25 Appendix 3 – CRS resources and types ................................................................... 25 Appendix 4 –
    [Show full text]
  • Using Oracle Goldengate 12C for Oracle Database
    An Oracle White Paper Updated September 2013 Using Oracle GoldenGate 12c for Oracle Database Using Oracle GoldenGate 12c for Oracle Database Executive Overview ........................................................................... 2 Introduction ....................................................................................... 3 Architecture Overview ....................................................................... 4 Oracle GoldenGate Capture .......................................................... 5 Oracle GoldenGate Trail Files ....................................................... 6 Oracle GoldenGate Delivery .......................................................... 7 Oracle GoldenGate Manager ......................................................... 9 Associated Products ...................................................................... 9 One Platform, Many Solutions ......................................................... 11 Zero Downtime Migrations and Upgrades .................................... 12 Query Offloading ......................................................................... 12 Disaster Recovery and Data Protection ....................................... 13 Active-Active Database Replication ............................................. 13 Operational Reporting and Real-Time Data Warehousing ........... 13 Data Distribution and Synchronization for OLTP Systems ........... 14 Oracle GoldenGate for Oracle Database ......................................... 15 Capture (Extract) ........................................................................
    [Show full text]
  • Providing High Availability for SAP Resources
    Providing High Availability for SAP Resources An Oracle White Paper April 2006 Providing High Availability for SAP Resources Overview of High Availability for SAP Resources .......................................3 SAP Support for High Availability..................................................................3 Installation and Management ...........................................................................5 Overview Of Installation and Configuration.................................................6 Functionality.......................................................................................................9 Usage ...............................................................................................................9 Conclusion........................................................................................................ 12 W orked Example............................................................................................. 13 Appendix 1 – Standard SAP Script Modifications..................................... 18 Script: startsap............................................................................................. 18 Script: stopsap............................................................................................. 21 Appendix 2 – Sample profile scripts - ENQUEUE Service (ASC)......... 23 Appendix 3 – Sample profile scripts - REPLICATION Service (ENR)25 Appendix 4 – Troubleshooting and Log Files............................................ 27 Appendix 5 - SAPCTL Bill of Materials.....................................................
    [Show full text]
  • Magic Quadrant for Data Warehouse Database Management Systems
    Magic Quadrant for Data Warehouse Database Management Systems Gartner RAS Core Research Note G00209623, Donald Feinberg, Mark A. Beyer, 28 January 2011, RV5A102012012 The data warehouse DBMS market is undergoing a transformation, including many acquisitions, as vendors adapt data warehouses to support the modern business intelligence and analytic workload requirements of users. This document compares 16 vendors to help you find the right one for your needs. WHAT YOU NEED TO KNOW Despite a troubled economic environment, the data warehouse database management system (DBMS) market returned to growth in 2010, with smaller vendors gaining in acceptance. As predicted in the previous iteration of this Magic Quadrant, 2010 brought major acquisitions, and several of the smaller vendors, such as Aster Data, Ingres and Vertica, took major strides by addressing specific market needs. The year also brought major market growth from data warehouse appliance offerings (see Note 1), with both EMC/Greenplum and Microsoft formally introducing appliances, and IBM, Oracle and Teradata broadening their appliance lines with new offerings. Although we believe that much of the growth was due to replacements of aging or performance-constrained data warehouse environments, we also think that the business value of using data warehouses for new applications such as performance management and advanced analytics has driven — and is driving — growth. All the vendors have stepped up their marketing efforts as the competition has grown. End-user organizations should ignore marketing claims about the applicability and performance capabilities of solutions. Instead, they should base their decisions on customer references and proofs of concept (POCs) to ensure that vendors’ claims will hold up in their environments.
    [Show full text]
  • Oracle Database Net Services Reference, 12C Release 1 (12.1) E17611-13
    Oracle®[1] Database Net Services Reference 12c Release 1 (12.1) E17611-13 December 2014 Oracle Database Net Services Reference, 12c Release 1 (12.1) E17611-13 Copyright © 2002, 2014, Oracle and/or its affiliates. All rights reserved. Primary Author: Caroline Johnston Contributor: The Oracle Database 12c documentation is dedicated to Mark Townsend, who was an inspiration to all who worked on this release. Contributors: Robert Achacoso, Abhishek Dadhich, Santanu Datta, Steve Ding, Feroz Khan, Peter Knaggs, Bhaskar Mathur, Scot McKinley, Ed Miner, Sweta Mogra, Srinivas Pamu, Kant Patel, Hector Pujol, Murali Purayathu, Karthik Rajan, Saravanakumar Ramasubramanian, Sudeep Reguna, Ching Tai, Norman Woo This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S.
    [Show full text]
  • Technical Comparison of Oracle Real Application Cluster 11G Vs. IBM
    Technical Comparison of Oracle Real Application Clusters 11 g vs. IBM DB2 v9.5 for Linux, Unix, and Windows An Oracle White Paper August 2008 Technical Comparison of Oracle Real Application Cluster 11 g vs. IBM DB2 v9.5 for Linux, Unix, and Windows Introduction ....................................................................................................... 3 Enterprise Grid Computing........................................................................ 3 Enabling Enterprise Grids........................................................................... 4 Enterprise Grid Deployment........................................................................... 5 Shared-Nothing Architecture...................................................................... 5 DB2 Shared-Nothing............................................................................... 5 Shared Disk Architecture............................................................................. 6 Best Choice For Deployment ..................................................................... 7 IBM DB2 Partitioning.................................................................................. 8 Nodegroups............................................................................................... 8 IBM DB2 Design Advisor...................................................................... 9 Scalability and Performance............................................................................. 9 Adding nodes..............................................................................................
    [Show full text]
  • Oracle Database 10G Vs. IBM DB2 UDB Technical Overview
    Oracle Database10g vs IBM DB2 UDB 8.1 Technical Overview An Oracle Competitive White Paper March 2004 Technical Overview of Oracle Database 10g with IBM DB2 UDB V8.1 Introduction....................................................................................................... 3 Grid Computing................................................................................................ 3 Manageability..................................................................................................... 7 Self-Managing Database.............................................................................. 8 Application/SQL Tuning...................................................................... 10 Managing the Enterprise............................................................................ 11 High Availability.............................................................................................. 12 Bounding Database Crash Recovery....................................................... 13 Human Error Recovery............................................................................. 14 Online Maintenance................................................................................... 15 Data Centre Disasters ................................................................................ 15 Data W arehousing and Business Intelligence............................................. 16 Partitioning.................................................................................................. 18 Data Loading and Archiving....................................................................
    [Show full text]
  • Progress Datadirect for JDBC for Oracle User's Guide
    Progress DataDirect for JDBC for Oracle User©s Guide Release 6.0.0 Copyright © 2020 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. These materials and all Progress® software products are copyrighted and all rights are reserved by Progress Software Corporation. The information in these materials is subject to change without notice, and Progress Software Corporation assumes no responsibility for any errors that may appear therein. The references in these materials to specific platforms supported are subject to change. Corticon, DataDirect (and design), DataDirect Cloud, DataDirect Connect, DataDirect Connect64, DataDirect XML Converters, DataDirect XQuery, DataRPM, Defrag This, Deliver More Than Expected, Icenium, Ipswitch, iMacros, Kendo UI, Kinvey, MessageWay, MOVEit, NativeChat, NativeScript, OpenEdge, Powered by Progress, Progress, Progress Software Developers Network, SequeLink, Sitefinity (and Design), Sitefinity, SpeedScript, Stylus Studio, TeamPulse, Telerik, Telerik (and Design), Test Studio, WebSpeed, WhatsConfigured, WhatsConnected, WhatsUp, and WS_FTP are registered trademarks of Progress Software Corporation or one of its affiliates or subsidiaries in the U.S. and/or other countries. Analytics360, AppServer, BusinessEdge, DataDirect Autonomous REST Connector, DataDirect Spy, SupportLink, DevCraft, Fiddler, iMail, JustAssembly, JustDecompile, JustMock, NativeScript Sidekick, OpenAccess, ProDataSet, Progress Results, Progress Software, ProVision, PSE Pro, SmartBrowser, SmartComponent,
    [Show full text]
  • Implementation of Oracle Real Application Cluster
    REVIEW OF COMPUTER ENGINEERING A publication of IIETA STUDIES ISSN: 2369-0755 (Print), 2369-0763 (Online) Vol. 3, No. 4, December 2016, pp. 81-85 DOI: 10.18280/rces.030401 Licensed under CC BY-NC 4.0 http://www.iieta.org/Journals/RCES Implementation of Oracle Real Application Cluster Hayat Z.1*, Soomro T.R.2 *1 Ankabut IT, Khalifa University of Science & Technology, UAE 2 Department of Computer Science, SZABIST Dubai Campus, UAE Email: [email protected] ABSTRACT Oracle Real Application Cluster (RAC) is Oracle technology component that can be active on multiple servers to serve as Oracle database server unique. Each server has separate own database instance and these all act as instance of single database for the external network access on the unique virtual IP address. The architecture of the RAC is provides fault tolerance and a great power of treatment. Oracle RAC provides several features in any database operation of the enterprise level for example the scalability, availability, load balancing and failover. The purpose of this study is to serve as a quick reference guide to the installation of management and operation of the Oracle Cluster-ware Real Application Cluster (RAC) database. This feature is initially targeted for the technical staff mainly for DBA Admins of the system. Keywords: RAC, Shared Storage, SAN, Cluster-ware, Oracle ASM, Oracle Kernel. 1. INTRODUCTION systems in the cloud and enterprise software products. Oracle is the world's leading supplier of software for information Cluster database instances are common challenging for management, but is best known for its refined products for each vendor application is presenting its own architecture relational databases.
    [Show full text]
  • Business Continuity with IBM Hyperswap, Oracle Real Application Cluster (RAC), and Vmware Site Recovery Manager a Solution Overview
    IBM Systems Technical White Paper November 2017 Business continuity with IBM HyperSwap, Oracle Real Application Cluster (RAC), and VMware Site Recovery Manager A solution overview Overview In today’s world, coming up with a business continuity plan (BCP) is imperative to the sustainability of your business. Without a well-thought-out Challenge plan in place, it is highly unlikely that your company will be able to survive and recover from disasters. Do you have a business continuity plan to mitigate threats and risks for The BCP involves creation of a strategy through the recognition of threats and your business in the event of risks facing a company, with an eye to ensure that personnel and assets are potential disaster? Do you have an protected and are able to function in the event of a disaster. orchestrated way to quickly get your Business continuity is a proactive plan to avoid and mitigate risks associated business running post disaster? with a disruption of operations. It details the steps to be taken before, during, and after an event to maintain the financial viability of an organization. Solution People often mistake disaster recovery as business continuity plan. It is in fact Consider a multifold solution that a reactive plan for responding after an event. brings together the best of the However important the BCP is, there are several major roadblocks (such as industry technologies together. IBM business priority, prohibitive costs, high complexity, and willingness of the HyperSwap provides an active/active people involved) for successful implementation. replication of volumes across metro distances, Oracle Real Application Having a BCP enhances an organization's image with employees, shareholders, and customers by demonstrating a proactive attitude.
    [Show full text]
  • Oracle Real Application Clusters 11G
    Oracle Real Application Clusters 11g An Oracle Technical White Paper April 2007 Oracle Real Application Clusters 11g Introduction.......................................................................................................3 W hat is Oracle Real Application Clusters 11g?............................................3 Real Application Clusters Architecture.....................................................4 Oracle Clusterware...................................................................................5 Hardware Architecture............................................................................5 File Systems and Volume Management................................................6 Virtual Internet Protocol Address (VIP)..............................................6 Cluster Verification Utility......................................................................6 RAC on Extended Distance Clusters....................................................7 Benefits of Oracle Real Application Clusters................................................7 High Availability...........................................................................................7 Scalability........................................................................................................8 Managing Your Oracle Real Application Clusters Database......................9 Enterprise Manager......................................................................................9 Rolling Patch Application..........................................................................11
    [Show full text]