Does Big Data Mean Big Storage?
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Greenplum Database Performance on Vmware Vsphere 5.5
Greenplum Database Performance on VMware vSphere 5.5 Performance Study TECHNICAL WHITEPAPER Greenplum Database Performance on VMware vSphere 5.5 Table of Contents Introduction................................................................................................................................................................................................................... 3 Experimental Configuration and Methodology ............................................................................................................................................ 3 Test Bed Configuration ..................................................................................................................................................................................... 3 Test and Measurement Tools ......................................................................................................................................................................... 5 Test Cases and Test Method ......................................................................................................................................................................... 6 Experimental Results ................................................................................................................................................................................................ 7 Performance Comparison: Physical to Virtual ...................................................................................................................................... -
Data Warehouse Fundamentals for Storage Professionals – What You Need to Know EMC Proven Professional Knowledge Sharing 2011
Data Warehouse Fundamentals for Storage Professionals – What You Need To Know EMC Proven Professional Knowledge Sharing 2011 Bruce Yellin Advisory Technology Consultant EMC Corporation [email protected] Table of Contents Introduction ................................................................................................................................ 3 Data Warehouse Background .................................................................................................... 4 What Is a Data Warehouse? ................................................................................................... 4 Data Mart Defined .................................................................................................................. 8 Schemas and Data Models ..................................................................................................... 9 Data Warehouse Design – Top Down or Bottom Up? ............................................................10 Extract, Transformation and Loading (ETL) ...........................................................................11 Why You Build a Data Warehouse: Business Intelligence .....................................................13 Technology to the Rescue?.......................................................................................................19 RASP - Reliability, Availability, Scalability and Performance ..................................................20 Data Warehouse Backups .....................................................................................................26 -
Hitachi Solution for Databases in Enterprise Data Warehouse Offload Package for Oracle Database with Mapr Distribution of Apache
Hitachi Solution for Databases in an Enterprise Data Warehouse Offload Package for Oracle Database with MapR Distribution of Apache Hadoop Reference Architecture Guide By Shashikant Gaikwad, Subhash Shinde December 2018 Feedback Hitachi Data Systems welcomes your feedback. Please share your thoughts by sending an email message to [email protected]. To assist the routing of this message, use the paper number in the subject and the title of this white paper in the text. Revision History Revision Changes Date MK-SL-131-00 Initial release December 27, 2018 Table of Contents Solution Overview 2 Business Benefits 2 High Level Infrastructure 3 Key Solution Components 4 Pentaho 6 Hitachi Advanced Server DS120 7 Hitachi Virtual Storage Platform Gx00 Models 7 Hitachi Virtual Storage Platform Fx00 Models 7 Brocade Switches 7 Cisco Nexus Data Center Switches 7 MapR Converged Data Platform 8 Red Hat Enterprise Linux 10 Solution Design 10 Server Architecture 11 Storage Architecture 13 Network Architecture 14 Data Analytics and Performance Monitoring Using Hitachi Storage Advisor 17 Oracle Enterprise Data Workflow Offload 17 Engineering Validation 29 Test Methodology 29 Test Results 30 1 Hitachi Solution for Databases in an Enterprise Data Warehouse Offload Package for Oracle Database with MapR Distribution of Apache Hadoop Reference Architecture Guide Use this reference architecture guide to implement Hitachi Solution for Databases in an enterprise data warehouse offload package for Oracle Database. This Oracle converged infrastructure provides a high performance, integrated, solution for advanced analytics using the following big data applications: . Hitachi Advanced Server DS120 with Intel Xeon Silver 4110 processors . Pentaho Data Integration . MapR distribution for Apache Hadoop This converged infrastructure establishes best practices for environments where you can copy data in an enterprise data warehouse to an Apache Hive database on top of Hadoop Distributed File System (HDFS). -
Database Solutions on AWS
Database Solutions on AWS Leveraging ISV AWS Marketplace Solutions November 2016 Database Solutions on AWS Nov 2016 Table of Contents Introduction......................................................................................................................................3 Operational Data Stores and Real Time Data Synchronization...........................................................5 Data Warehousing............................................................................................................................7 Data Lakes and Analytics Environments............................................................................................8 Application and Reporting Data Stores..............................................................................................9 Conclusion......................................................................................................................................10 Page 2 of 10 Database Solutions on AWS Nov 2016 Introduction Amazon Web Services has a number of database solutions for developers. An important choice that developers make is whether or not they are looking for a managed database or if they would prefer to operate their own database. In terms of managed databases, you can run managed relational databases like Amazon RDS which offers a choice of MySQL, Oracle, SQL Server, PostgreSQL, Amazon Aurora, or MariaDB database engines, scale compute and storage, Multi-AZ availability, and Read Replicas. You can also run managed NoSQL databases like Amazon DynamoDB -
EMC Secure Remote Services 3.18 Site Planning Guide
EMC® Secure Remote Services Release 3.26 Site Planning Guide REV 01 Copyright © 2018 EMC Corporation. All rights reserved. Published in the USA. Published January 2018 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners. For the most up-to-date regulatory document for your product line, go to Dell EMC Online Support (https://support.emc.com). 2 EMC Secure Remote Services Site Planning Guide CONTENTS Preface Chapter 1 Overview ESRS architecture........................................................................................ 10 ESRS installation options ...................................................................... 10 Other components ................................................................................ 11 Requirements for ESRS customers......................................................... 11 Supported devices..................................................................................... -
Mapr Spark Certification Preparation Guide
MAPR SPARK CERTIFICATION PREPARATION GUIDE By HadoopExam.com 1 About Spark and Its Demand ........................................................................................................................ 4 Core Spark: ........................................................................................................................................ 6 SparkSQL: .......................................................................................................................................... 6 Spark Streaming: ............................................................................................................................... 6 GraphX: ............................................................................................................................................. 6 Machine Learning: ............................................................................................................................ 6 Who should learn Spark? .............................................................................................................................. 6 About Spark Certifications: ........................................................................................................................... 6 HandsOn Exam: ......................................................................................................................................... 7 Multiple Choice Questions: ...................................................................................................................... -
Dell EMC IT Big Data Analytics Journey
Dell EMC IT Big Data Analytics Journey Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC Agenda 1 Dell EMC IT Big Data Journey 2 Building the Data Lake 3 Marketing Science Lab Use Case 4 Technical Benefits 5 Lessons Learned 6 Q&A 3 Dell - Internal Use - Confidential Dell EMC IT Big Data Journey A Journey Of Maturity 1 AGGREGATE 2 LIBERATE 3 INNOVATE/ITERATE HARNESS Consolidation BA-as-a-Service Flexible / Scalable Analytics -based decision making Master Data Data Scientist Services Mission Critical Leveraging data to predict future models Common BI Tools Collaborative Analytic Tools Real Time Capable Transforming operations by BI Governance Unified Analytical Platform Collaborative Delivery applying analytics FOUNDATION ANALYTICS ENABLEMENT DATA LAKE ANALYTICS ENTERPRISE 2010 2011 2012 2013 2014 2015 2016 4 Dell - Internal Use - Confidential Building The Data Lake PROCESS MONITOR THE MEASURE BUSINESS IMPROVE THE EXECUTION BUSINESS PERFORMANCE BUSINESS APPS ERP INNOVATE CRM ITERATE REFINE Master Data Workspace Analytics Machine BU App Data EMBED INTO BUSINESS APPS “MAKE THEM SMARTER” GOVERNANCE 5 Dell - Internal Use - Confidential Powered by Intel® Xeon® Processors Dell EMC IT Data Lake Architecture ANALYTICS TOOLBOX APPLICATIONS DATA GOVERNANCE APPLICATIONS COLLIBRA BATCH - DATA PLATFORM MICRO EXECUTION CASSANDRA POSTGRESQL MEMORY DB GEMFIRE PROCESS SPRING XD PIVOTAL HD GREENPLUM DB ATTIVIO BATCH APACHE APACHE RANGER INGESTION Social Media Sensor Network Web Supplier Market ERP CRM PLM UNSTRUCTURED STRUCTURED -
Wherescape RED for Pivotal Greenplum
WhereScape RED for Pivotal Greenplum Wherescape red for pivotal greenplum WhereScape RED is an agile data warehouse development and management solution that automates much of the data warehouse life cycle—from initial scoping, prototyping, loading and populating to ongoing management and optimization. In addition, WhereScape RED automates the creation and management of documentation, diagrams and lineage information. “Our results using Optimized for Greenplum WhereScape have been extremely impressive. WhereScape RED for Pivotal Greenplum is optimized to fully leverage the Greenplum Database. WhereScape RED accelerates time to value for your WhereScape enabled Greenplum investment by requiring fewer resources to model, build and us to design, develop, deploy your data warehouse. Eliminating hand coding and automating document and deploy Greenplum development creates a simplified infrastructure a production-ready and dramatically reduces total cost of ownership. solution in 8 weeks. WhereScape RED “knows” all Greenplum objects—including views, Using traditional data distribution keys and append-only tables, and utilizes Greenplum’s warehouse development rich feature set to build native Greenplum objects, document them methods would have and schedule data to be loaded. Utilizing the RED user interface, users taken us 6-8 months.” can simply drag and drop to develop Greenplum objects—build tables, generate Greenplum SQL code to populate the tables, and create HTML documentation. RED’s open metadata architecture is stored in database Ryan Fenner, VP, Data tables for easy access and integrates with external testing and source Solutions Architect, control tools. Union Bank WhereScape RED works seamlessly as an ELT (extract, load and transformation) using the Greenplum GPLOAD bulk load utility, the fast method for loading data into Greenplum. -
Pivotal Greenplum Command Center Documentation | Pivotal GPCC Docs
Table of Contents Table of Contents 1 Pivotal Greenplum Command Center Documentation 2 About Pivotal Greenplum Command Center 3 Installing the Greenplum Command Center Software 6 Downloading and Running the Greenplum Command Center Installer 7 Setting the Greenplum Command Center Environment 9 Creating the gpperfmon Database 10 Upgrading Greenplum Command Center 12 Uninstalling Greenplum Command Center 14 Creating Greenplum Command Center Console Instances 15 Greenplum Command Center User Guide 18 Connecting to the Greenplum Command Center Console 19 Dashboard 20 Query Monitor 23 Host Metrics 25 Cluster Metrics 27 Monitoring Multiple Greenplum Database Clusters 29 History 30 System 33 Segment Status 34 Storage Status 37 Admin 38 Permission Levels for GPCC Access 39 Authentication 41 Workload Management 43 Administering Greenplum Command Center 47 About the Command Center Installation 48 Starting and Stopping Greenplum Command Center 49 Administering Command Center Agents 50 Administering the Command Center Database 51 Administering the Web Server 52 Configuring Greenplum Command Center 53 Enabling Multi-Cluster Support 54 Securing a Greenplum Command Center Console Instance 56 Configuring Authentication for the Command Center Console 58 Enabling Authentication with Kerberos 60 Securing the gpmon Database User 65 Utility Reference 67 gpcmdr 68 gpccinstall 70 Configuration File Reference 71 Command Center Agent Parameters 72 Command Center Console Parameters 74 Setup Configuration File 75 Greenplum Database Server Configuration Parameters 77 © Copyright Pivotal Software Inc, 2013-2017 1 3.3.1 Pivotal Greenplum Command Center Documentation Documentation for Pivotal Greenplum Command Center. About Greenplum Command Center Pivotal Greenplum Command Center is a management tool for the Greenplum Big Data Platform. This section introduces key concepts about Greenplum Command Center and its components. -
In the United States District Court for the Eastern District of Texas Tyler Division
Case 6:11-cv-00660-LED Document 1 Filed 12/08/11 Page 1 of 16 PageID #: 1 IN THE UNITED STATES DISTRICT COURT FOR THE EASTERN DISTRICT OF TEXAS TYLER DIVISION Personalweb Technologies LLC Plaintiff, v. Civil Action No. 6:11-cv-660 EMC Corporation, and JURY TRIAL REQUESTED VMware, Inc. Defendants. COMPLAINT FOR PATENT INFRINGEMENT Plaintiff PersonalWeb Technologies LLC files this Complaint for Patent Infringement against EMC Corporation and VMware Inc. (collectively, “Defendants”) and states as follows: THE PARTIES 1. Plaintiff PersonalWeb Technologies LLC (“PersonalWeb” or “Plaintiff”) is a limited liability company organized under the laws of Texas with its principal place of business at 112 E. Line Street, Suite 204, Tyler, Texas, 75702. PersonalWeb was founded in August 2010 and is in the business of developing and distributing software based on its technology assets. 2. PersonalWeb protects its proprietary business applications and operations through a portfolio of patents that it owns, including 13 issued and pending United States patents. PersonalWeb is assignee and owner of eight patents at issue in this action: U.S. Patent Nos. 5,978,791, 6,415,280, 6,928,442, 7,802,310, 7,945,539, 7,945,544, 7,949,662, and 8,001,096. 3. Defendant EMC Corporation (“EMC”) is a Massachusetts Corporation with its principal place of business at 176 South Street, Hopkinton, Massachusetts. EMC is qualified to McKool 298950v1 Case 6:11-cv-00660-LED Document 1 Filed 12/08/11 Page 2 of 16 PageID #: 2 do business in the state of Texas, Filing No. 0007347306, and has appointed CT Corporation System, 350 N Saint Paul St. -
EMC STRATEGY Journey to Cloud -Big Data
EMC STRATEGY Journey to Cloud -Big Data Agathi Galani Indirect District Manager Greece, Malta, Cyprus 5th December 2011 © Copyright 2011 EMC Corporation. All rights reserved. 1 EMC’s Mission To Lead Customers On Their Journey To Hybrid Cloud Computing © Copyright 2011 EMC Corporation. All rights reserved. 2 The Journey to Your Cloud: Infrastructure Private Cloud is the logical first step Enterprise IT Private Cloud Public Cloud ComplexTrusted Simple ControlledExpensive Low Cost InflexibleReliable Flexible SecureSiloed Dynamic “70% Will Spend More On Private Cloud through 2012” GARTNER DATA CENTER CONFERENCE 2009 Infrastructure © Copyright 2011 EMC Corporation. All rights reserved. 3 The Journey To The Private Cloud % Virtualized Simplicity Scalability Efficiency Continuity Standardization Protection Security Automation IT Production Business Production IT-as-a-Service Infrastructure Focus Applications Focus Business Focus © Copyright 2011 EMC Corporation. All rights reserved. 4 IT Production Virtualize non-business-critical IT-owned applications Challenges Approach • Islands of infrastructure • Consolidated infrastructure • CAPEX • Virtualized servers • Power • Tiered SANs • Disk-based backup Efficiency © Copyright 2011 EMC Corporation. All rights reserved. 5 EMC IT: IT Production Benefits Realized IT Production EMC IT Department Efficiency Benefits Realized $12M Power and Space Savings $74M Data Center Equipment Savings 170% Gain in Storage Admin Productivity 34% Increase in Energy Efficiency 60M Pounds of CO 2 Reduced Phase 1 IT -owned Apps © Copyright 2011 EMC Corporation. All rights reserved. 6 “VNXe is the easiest storage device we’ve ever used” THE CITY OF SAFFORD “Extremely well equipped, and starting at under $10,000 represents excellent value” COMPUTER RESELLER NEWS Simple. Efficient. Affordable. © Copyright 2011 EMC Corporation. All rights reserved. -
IBM Big SQL (With Hbase), Splice Major Contributor to the Apache Be a Major Determinant“ Machine (Which Incorporates Hbase Madlib Project
MarketReport Market Report Paper by Bloor Author Philip Howard Publish date December 2017 SQL Engines on Hadoop It is clear that“ Impala, LLAP, Hive, Spark and so on, perform significantly worse than products from vendors with a history in database technology. Author Philip Howard” Executive summary adoop is used for a lot of these are discussed in detail in this different purposes and one paper it is worth briefly explaining H major subset of the overall that SQL support has two aspects: the Hadoop market is to run SQL against version supported (ANSI standard 1992, Hadoop. This might seem contrary 1999, 2003, 2011 and so on) plus the to Hadoop’s NoSQL roots, but the robustness of the engine at supporting truth is that there are lots of existing SQL queries running with multiple investments in SQL applications that concurrent thread and at scale. companies want to preserve; all the Figure 1 illustrates an abbreviated leading business intelligence and version of the results of our research. analytics platforms run using SQL; and This shows various leading vendors, SQL skills, capabilities and developers and our estimates of their product’s are readily available, which is often not positioning relative to performance and The key the case for other languages. SQL support. Use cases are shown by the differentiators“ However, the market for SQL engines on colour of each bubble but for practical between products Hadoop is not mono-cultural. There are reasons this means that no vendor/ multiple use cases for deploying SQL on product is shown for more than two use are the use cases Hadoop and there are more than twenty cases, which is why we describe Figure they support, their different SQL on Hadoop platforms.