prepared for Driving the Future of Data Warehousing and Analytics Dell

Greenplum/Dell Data Cloud Solution

Integrated Solution for Analytics, Data Mart Consolidation, and Highlights Large-Scale Data Warehousing • Leave proprietary hardware- Dell and Greenplum deliver a best-of-breed solution to address the growing need for based data warehousing systems behind with a large-scale analytics and data warehousing. The solution provides a combination of high-performance, scalable performance, scalability, efficiency and flexibility that are unique to Greenplum and Dell. solution from Greenplum The Dell PowerEdge C2100’s balance of compute, memory, and storage resources compose and Dell an ideal platform to take full advantage of Greenplum ’s shared-nothing, MPP • Add capacity and perfor- (Massively Parallel Processing) architecture. mance to keep pace with your business by deploying Data-driven businesses around the world are leveraging Greenplum’s solutions and additional nodes as you need them reaping the benefits of fast, flexible access to all their data for business intelligence and analytics. Over 30 trillion rows of data are being managed by Greenplum today. • Take advantage of the Organizations from a wide variety of industries rely on Greenplum Database to support their innovation of powerful x86-based servers and mission-critical business functions. In the financial services industry, for example, 5 billion enterprise class MPP shares are analyzed daily by stock exchanges and regulatory firms using Greenplum. database software Companies such as NASDAQ OMX, Nokia-Siemens Networks, NYSE Euronext, Reliance • Experience 10 to 100 times Communications, Ricoh, Skype, Sony, T-Mobile, and Vodafone rely on Greenplum to deliver the performance of traditional the insights they require to drive their businesses forward. data warehouse solutions • Preserve scarce data center Greenplum Database was purpose-built for large-scale, complex analytical and business space and reduce power intelligence work loads. A cluster of Dell PowerEdge C-Series servers running Greenplum consumption by leveraging operates as a single database supercomputer, automatically partitioning and parallelizing the efficiency of Dell’s PowerEdge C-Series servers queries, to achieve performance that is 10s or 100s of times faster than traditional data- base platforms. When additional capacity is required, organizations can scale out cost effectively by simply adding servers. With linear scalability, performance remains lightning quick even as data volumes grow to Petabyte levels. This solution excels at large-scale warehousing and data mart consolidation.

The Greenplum/Dell Data Cloud solution provides:

Extreme performance and scalability • Each powerful Dell PowerEdge C-Series server in a cluster works simultaneously on every query and load operation, in parallel, contributing to best-in-class performance • From low Terabytes to the largest multi-Petabyte data warehouses— scale is no longer a barrier • Maintain industry-leading performance even as your data grows with linear horizontal scaling

1 www.greenplum.com Hardware Specifications Elastic expansion and sandboxing Component/Measure: • Add Dell servers while the database is online for more storage capacity and performance Per Rack (no costly appliance upgrades) • Supports agile data warehousing practices — easily create private sandboxes for C2100 servers: 18 business analysts, using cost effective Dell hardware CPU Cores: 216 RAM: 864GB to 2.6TB Massively parallel analytic processing Drives: 192 Usable capacity: 36TB to 123TB • Unified parallel engine supports SQL and MapReduce processing across 100s or (or more with compression; 1000s of Intel Xeon CPU cores depends on drive size/speed) • Comprehensive SQL support (SQL-92, SQL-99, SQL-2003 OLAP extensions) Power consumption: 9,600 kVA

The Greenplum/Dell solution moves processing power as close as possible to the data, so processing always occurs in parallel, delivering unmatched query and load performance. Greenplum Database running on Dell PowerEdge C-Series servers is the industry’s fastest and most affordable high-end data warehousing combination, giving users the power to answer complex questions and run analyses that used to take days with traditional solutions, in literally just seconds.

Solution Features

Core MPP architecture Provides automatic parallelization of data and queries— all data is automatically partitioned across all nodes of the system, and queries are planned and executed using all nodes working together in a highly coordinated fashion.

Multi-level fault tolerance Utilizes multiple levels of fault-tolerance and redundancy that allow it to automatically continue operation in the face of hardware or software failures.

Online system expansion Add C2100 servers to increase storage capacity, processing performance and loading performance. The database can remain online and fully available while the expansion process takes place in the background. Performance and capacity increase linearly and predictably as servers are added.

Intel Xeon 5500/5600 processors Fast processing of even the most complex queries, support for high concurrency and mixed work load environments.

Massive disk capacity and i/o throughput in an efficient footprint 12 high capacity disks in each 2 rack unit server, and PERC H700 controllers supply maximum disk i/o throughput while preserving scarce data center real estate.

Industry leading power efficiency The Dell PowerEdge C-Series servers leverage Intel’s latest intelligent energy management technology, delivering up to 50 percent lower server idle power consumption compared to the previous generation of Intel processors.

Expanded memory Up to 144GB of memory in just 2 rack units.

2 www.greenplum.com Solution Features (continued)

Fast interconnect Choose 1 Gb or 10 Gb networking as the high-speed platform for Greenplum’s software-based intelligent interconnect.

Workload management Provides administrative control over system resources and their allocation to queries. Allows users to be assigned to resource queues that manages the inflow of work to the database. Also allows priority adjustment of running queries.

Petabyte-scale loading High-performance loading utilizing MPP Scatter/Gather Streaming technology. Loading speeds scale with each additional C-Series server to greater than 4TB/hour.

Trickle micro-batching When loading a continuous stream, trickle micro-batching allows data to be loaded at frequent intervals (e.g. every 5 minutes) while maintaining extremely high data ingest rates.

Anywhere data access Allows queries to be executed from the database against external data sources, returning data in parallel, regardless of their location, format, or storage medium.

Hybrid storage & execution (row- and column-oriented) For each table (or partition of a table), the DBA can select the storage, execution and compression settings that suit the way that table will be accessed. This includes the choice of row- or column-oriented storage & processing for any table or partition. Leverages Greenplum’s Polymorphic Data Storage™ technology.

In-database compression Utilizes industry-leading compression technology to increase performance and dramatically reduce the space required to store data. Customers can expect to see a 3-10x disk space reduction with a corresponding increase in effective I/O performance.

Multi-level partitioning Allows flexible partitioning of tables based on date, range or value. Partitioning is specified using DDL and allows an arbitrary number of levels. The query optimizer will automatically prune unneeded partitions from the query plan.

Indexes—Btree, Bitmap, and more Greenplum supports a range of index types including B-Tree and Bitmap.

Comprehensive SQL Comprehensive SQL-92 and SQL-99 support with SQL 2003 OLAP extensions. All queries are parallelized and executed across the entire system.

Native MapReduce MapReduce has been proven as a technique for high-scale data analysis by Internet leaders such as Google and Yahoo. Greenplum natively runs MapReduce programs within its parallel engine.

3 www.greenplum.com Solution Features (continued)

SQL 2003 OLAP extensions Provides a fully-parallelized implementation of SQL recently added OLAP extensions. Full standard support, including window functions, rollup, cube and a wide range of other expressive functionality.

Programmable analytics Offers a new level of parallel analysis capabilities for mathematicians and statisticians, with support for R, linear algebra and machine learning primitives.

Client access & 3rd party tools Supports standard database interfaces (SQL, ODBC, JDBC, OLEDB, etc) and is fully supported and certified by a wide range of business intelligence (BI) and extract/transform/load (ETL) tools including Ab Initio, SAP/Business Objects DataServices and Web Intelligence, GoldenGate, IBM Cognos and DataStage, Informatica PowerCenter and PowerExchange, JasperSoft, MicroStrategy, SAS, Talend, and others.

Greenplum Performance Monitor View the performance of your Greenplum Database system including system metrics and query details. The dashboard view allows you to monitor the system utilization during query runs. Drill down into a query’s detail and plan to understand its performance. pgAdmin3 for GPDB pgAdmin3 is the most popular and feature rich Open Source administration and development platform for PostgreSQL. Greenplum Database 3.3 ships with an enhanced version of pgAdmin III that has been extended to work with Greenplum Database and provides full support for Greenplum-specific capabilities.

Contact Us

For more information about the Greenplum Data Warehouse Solution, visit www.greenplum.com.

About Greenplum

Greenplum is the pioneer of Enterprise Data Cloud™ solutions that bring the power of self-service to large-scale data warehousing and analytics, providing customers with flexible access to all their data for business intelligence and analytics. Greenplum offers industry-leading performance at a low cost for companies managing terabytes to petabytes of data. Data-driven businesses around the world, including NASDAQ OMX, NYSE Euronext, Reliance Communications, Skype and Fox Interactive Media/MySpace, have adopted the Greenplum Database to support their mission-critical business functions. For more information visit www.greenplum.com.

Corporate Headquarters 1900 South Norfolk Street San Mateo, CA 94403 USA tel: 650-286-8012 www.greenplum.com 4