Greenplum: Data Protection and High Availability

GREENPLUM: DATA PROTECTION AND HIGH AVAILABILITY - A NEED FOR EVERY DATA WAREHOUSE Sumit Gupta Project Engineer Aditya Infotech Puneet Goyal Senior Specialist HCL Technologies TABLE OF CONTENTS INTRODUCTION ...... 4 ARCHITECTURAL STUDY OF EMC GREENPLUM DCA ...... 5 Key Technology Pillars ...... 6 Scatter/Gather Streaming technology ...... 7 Master Servers ...... 8 Segment Servers ...... 10 gNet Software Interconnect ...... 11 BACKUP SOLUTION FOR GREENPLUM DCA ...... 12 Using Data Domain Boost ...... 12 Backup a database with gp_dump ...... 13 Automating Parallel Backups with gpcrondump ...... 13 EXAMPLES OF BACKUP AND RESTORE ...... 14 TEST RESULTS ...... 22 Objective ...... 22 Test 1 ...... 23 Test 2 ...... 23 Test 3 ...... 25 Test 4 ...... 26 Test 5 ...... 26 Test 6 ...... 27 Results ...... 29 DISASTER RECOVERY SOLUTION FOR GREENPLUM DCA ...... 30 Segment failure on local site A ...... 30 Allocation and mounting of SAN devices on the DCA ...... 32 Moving Mirrors ...... 32 SAN Mirror SRDF/S consistency group ...... 33 SAN Mirror rotating snapshots ...... 33 Failover and Failback ...... 34 Time Analysis ...... 36 HIGH AVAILABILITY SOLUTION FOR GREENPLUM ...... 36 WORKING ...... 37 PREPARE DCAs FOR SAN MIRROR ...... 38

2012 EMC Proven Professional Knowledge Sharing 2

CONFIGURE THE VMAX FOR SAN MIRROR ATTACHMENT ...... 39 CONFIGURE THE DCAs TO USE VMAX DISK ...... 40 STARTING AUTOSNAP ...... 42 MONITORING AUTOSNAP’s OPERATION ...... 42 FAILOVER AND FAILBACK ...... 42 FAILOVER ...... 42 FAILBACK ...... 44 CASE STUDY ...... 46 CONCLUSION ...... 47 REFERENCE ...... 49

Disclaimer: The views, processes, or methodologies published in this article are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies.

2012 EMC Proven Professional Knowledge Sharing 3

INTRODUCTION Data warehouses are a critical tool for making business decisions. As data warehouse and business intelligence systems (DW/BI) continue to grow, they bring some difficult challenges with them. Some of those are; high-performance; high-availability; protected storage; backup and disaster recovery; and constrained backup windows. With today’s rapid data growth, a periodic full backup to tape or to a non-deduplicating disk-based storage system is no longer a viable option. Tape or non-deduplicated disk-based backups do not provide the recoverability and reliability customer’s demand in large capacity data warehouse environments.

Big Data is, well, big, and size is not the only challenge it places on backup. It also is a backup application's worst nightmare because many Big Data environments consist of millions or even billions of small files. How do you design a backup infrastructure that will support the Big Data realities?

The business impact of DW/BI outages has caused CIOs to demand comprehensive strategies for data protection, security, and high availability. Data recovery options must align with application and business requirements to yield the highest availability and predictable, scalable performance.

With the enormous amount of storage increasingly required for backing up data, businesses are finding it ever more difficult to achieve storage efficiencies. EMC®, together with Greenplum®, has created Data Computing Appliance (DCA). The DCA addresses essential business requirements as well as ensures predictable functional, performance, and scalability results. The DCA combined with Data Domain® systems, provides a total solution for data warehousing deployment that addresses all these key challenges. This Knowledge Sharing article will illustrate how to successfully back up, recover, and restore data from the DCA.

To integrate EMC Greenplum DCA into larger data centers, some customers require compatibility with advanced storage and software infrastructures—such as EMC Symmetrix® VMAX™—in order to achieve the highest levels of fault tolerance and to provide industry-leading disaster recovery. To facilitate this, EMC engineered a solution that integrates DCA with the VMAX, where the VMAX provides local and remote replicas of the data for disaster recovery and point-in-time snapshots. This allows customers to recover data warehouse/business intelligence (DW/BI) functionality quickly in the face of hardware or software failure, or even total site loss.

2012 EMC Proven Professional Knowledge Sharing 4

ARCHITECTURAL STUDY OF EMC GREENPLUM DCA This section discusses the logical and physical architecture of DCA. Some of the key architectural questions raised by every solution architect include:

 Is the hardware proprietary/non-proprietary/standardized?  Is this a shared nothing/ share disk/ share everything architecture?  Capability-in database analytics?  Scale-linear scalability?  What are your ingestion rates? (higher rate, better real time reporting)  Can I balance my workloads? How about mixed workloads?  What are my DW management overheads?

This section provides the answer to all of these questions. But first, let’s understand the basic architecture and its component of Greenplum DCA.

The DCA is a self-contained data warehouse solution that integrates all the database software, servers, and switches required to perform enterprise-scale data analytics workloads. The DCA is delivered racked and ready for immediate data loading and query execution.

The DCA provides everything needed to run a complete Greenplum database environment within a single rack. This includes:

 Greenplum database software  Master servers to run the master database  Segment servers that run the segment instances  A high-speed interconnect bus (consisting of two switches) to communicate requests from the master to the segments, between segments, and to provide high-speed access to the segment servers for quick parallel loading of data across all segment servers  An admin switch to provide administrative access to all of the DCA components

2012 EMC Proven Professional Knowledge Sharing 5

Figure 1: Architecture of Greenplum Data Computing Appliance

The DCA runs the Greenplum Database RDBMS software. The physical architecture of the DCA supports and enables the logical architecture of Greenplum Database, by utilizing the DCA components to perform its database operations and processing. The DCA consists of four operational layers: Compute, Storage, Database, and Network.

Layer Description

Compute The latest Intel processor architecture for excellent compute node performance.

Storage High-density, RAID-protected Serial Attached SCSI (SAS) disks. Database Greenplum Database incorporating MPP architecture. Dual 10 GigE Ethernet switches provide a high-speed, expandable IP interconnect Network solution across the Greenplum Database.

Table 1: DCA Conceptual View and DCA Layers

Key Technology Pillars 1. World’s fastest data loading: Scatter/Gather streaming technology. 2. Fast query execution with linear scalability: Shared-nothing MPP architecture. 3. Unified data access across the enterprise: Dynamic query optimization and workload management.

2012 EMC Proven Professional Knowledge Sharing 6

Scatter/Gather Streaming technology Greenplum’s MPP Scatter/Gather streaming (SG Streaming) technology eliminates the bottlenecks associated with other approaches of data loading, enabling a lightning-fast flow of data into Greenplum DCA.

Figure 2: Scatter Streaming

The system uses a ―parallel everywhere‖ approach to loading, in which data flows from one or more source systems to every node of the database without any sequential choke points. This approach differs from traditional ―bulk loading‖ technologies—used by most mainstream database and MPP appliance vendors—which push data from a single source, often over a single channel or a small number of parallel channels and result in fundamental bottlenecks and ever-increasing load times. Greenplum’s approach also avoids the need for a ―loader‖ tier of servers, as is required by some other MPP database vendors, which can add significant complexity and cost while effectively bottlenecking the bandwidth and parallelism of communication into the database.

Data can be transformed and processed on-the-fly, utilizing all nodes of the database in parallel for extremely high-performance extract-load-transform (ELT) and extract-transform-load transform (ETLT) loading pipelines.

2012 EMC Proven Professional Knowledge Sharing 7

Figure 3: Gather Streaming

Final ―gathering‖ and storage of data to disk takes place on all nodes simultaneously, with data automatically partitioned across nodes and optionally compressed. This technology is exposed to the database administrator via a flexible and programmable ―external table‖ interface and a traditional command-line loading interface.

Master Servers There are two master servers, one primary and one standby. The primary Master server mirrors logs to the standby so that it is available to take over in the event of a failure.

The standby Master server is a warm standby. If the primary Master server fails, the standby is available to take over as the primary. The standby Master server is kept up to date by a process that synchronizes the write-ahead-log (WAL) from the primary to the standby. If the primary Master server fails, the log replication process is shut down, and the standby can be activated in its place. Upon activation of the standby, the replicated logs are used to reconstruct the state of the primary Master server at the time of the last successfully committed transaction.

2012 EMC Proven Professional Knowledge Sharing 8

Figure 4: Master Mirroring in Greenplum Database

The activated standby Master server effectively becomes the Greenplum Database primary Master server, accepting client connections on the master port.

The primary Master server does not contain any user data; it contains only the system catalog tables that need to be synchronized between the primary and standby copies. These tables are not updated frequently, but when they are, changes are automatically copied over to the standby Master server so that it is always kept current with the primary. The operations performed by the primary Master server are to:

 Authenticate client connections  Process the incoming SQL, MapReduce, and other query commands  Distribute the work load between the Segment instances  Present aggregated final results to the client program

Master server Hardware specifications Hardware Quantity Specifications Processor 2 Intel X5680 3.33 GHz (6 core) Memory 48 GB DDR3 1333 MHz Dual-port converged Network Adapter 1 2 x 10 Gb/s RAID controller 1 Dual channel 6 Gb/s SAS Hard disk 6 600 GB 10k rpm SAS

2012 EMC Proven Professional Knowledge Sharing 9

Master server software specifications Software Version Red Hat Linux Version 5 5.5 Greenplum Database 4 Table 2: Hardware and software specifications of the Master server

Segment Servers Each Segment server runs several individual segment instances; these are small databases that each hold a slice of the overall data in the Greenplum Database. There are two types of Segment Instances:

1. Primary Segment Instances 2. Mirror Segment Instances

The DCA is shipped preconfigured, with six primary segments and six mirror segments per Segment Server. This is done to maintain a proper balance of primary and mirror data on the Segment Servers.

A Segment Instance is a combination of database process, memory, and storage What is Segment on a segment server. A Segment Instance contains a unique section of the entire Instance? database. One or more Segment Instances reside on each Segment Server.

A mirror segment always resides on a different host and subnet to its corresponding primary segment. This is to ensure that in case of a failover scenario, where a Segment Server is unreachable or down, the mirror counterpart of the primary instance is still available on another Segment Server.

Function Description User-defined tables and their indexes are distributed across the Data Storage Segment Servers of the DCA In the Greenplum Database, the database comprises multiple Hardware hosts for Segment database segments (Segment Instances). Multiple Segment Instances Instances are located on each Segment Server. Query processing Carry out the majority of query processing and data analysis Table 3: Functions of Segment Server

Segment Servers run segment database instances (Segment Instances). The majority of query processing occurs either within or between Segment Instances. Each Segment Server runs

2012 EMC Proven Professional Knowledge Sharing 10 many Segment Instances. Every Segment Instance has a segment of data from each user- defined table and index. In this way, queries are serviced in parallel by every Segment Instance on every Segment Server.

Users do not interact directly with the Segment Servers in a DCA. When a user connects to the database and issues a query, it is to the primary Master Server. Subsequently, the primary Master issues a distributed query tasks and then processes are created on each of the Segment Instances to handle the work of that query.

Segment Server Hardware specifications Hardware Quantity Specifications Processor 2 Intel X5680 2.93 GHz (6 core) Memory 48 GB DDR3 1333 MHz Dual-port converged Network Adapter 1 2 x 10 Gb/s RAID controller 1 Dual channel 6 Gb/s SAS Hard disk 12 600 GB 10k rpm SAS

Segment Server software specifications Software Version Red Hat Linux Version 5 5.5 Greenplum Database 4 Figure 4: Hardware and software details of the Segment Server gNet Software Interconnect In shared-nothing MPP database systems, data often needs to be moved whenever there is a join or an aggregation process for which the data requires repartitioning across the segments. As a result, the interconnect serves as one of the most critical components within Greenplum Database. Greenplum’s gNet Software interconnect optimizes the flow of data to allow continuous pipelining of processing without blocking processes on any of the servers in the system. The gNet Software Interconnect is tuned and optimized to scale to tens of thousands of processors and uses industry-standard Gigabit Ethernet and 10 GigE switch technology.

Pipelining is the ability to begin a task before its predecessor task has completed. Within the execution of each node in the query plan, multiple relational operations are processed by pipelining. For example, while a table scan is taking place, selected rows can be pipelined into a

2012 EMC Proven Professional Knowledge Sharing 11 join process. This ability is important to increasing basic query parallelism. Greenplum database utilizes pipelining whenever possible to ensure the highest possible performance.

BACKUP SOLUTION FOR GREENPLUM DCA The backup of Greenplum can be taken in any of three ways:

 Use Data Domain Boost  Back up a database with gp_dump  Automating parallel backups with gpcrondump

Using Data Domain Boost Data Domain provides faster backup and restore at the source of decrease network traffic. Following are the steps to configure DD Boost:

 Create a configuration file .ddconfig in the user’s home directory.  Copy the configuration file to all segments.

 Configure distributed segment processing in DD and apply to all the media servers and the OST plug-ins installed on them.

 Add interface into the group.

 Select one interface to add with the backup application. This will be used by the backup application and OST plug-in to communicate with the DD system. This is important for backup with DD.  Enable the feature and verify the configuration.

 Enable low-bandwidth optimization in Data Domain. No reboot is required. This feature takes additional CPU and memory on DD so it is used for optimized deduplication with less than 6 Mbps aggregate bandwidth. This is supported in standalone DD only.

2012 EMC Proven Professional Knowledge Sharing 12

 Enable encrypted optimized duplication.

Backup a database with gp_dump To run the backup, use the below command:

This performs the following actions on master and segment host:

On Master host:

 Dump the Greenplum configuration system catalog tables into a SQL file in the master data directory. File name is gp_catalog_1__.  Dump a CREATE DATABASE SQL statement into a file in the master data directory. File name is gp_cdatabase_1__.  Dump the user database schema into a SQL file in the master data directory. This file is used by gp_restore to create database schema. File name is gp_dump_1__.  Create a dump file in a master data directory to rebuild objects associated with the tables. File name is gp_dump_1___post_data.  The gp_dump launches a gp_dump_agent for each segment instance to be backed up.

On Segment host:

 By default, only the active instances are backed up. It dumps the user data for each Segment Instance into a SQL file in the Segment Instance’s data directory. File name is gp_dump_0__.  Create a log file in each Segment Instance’s data directory. File name is gp_dump_status_0__.

Automating Parallel Backups with gpcrondump This is a wrapper utility for gp_dump which can be called directly from crontab entry. It allows backup of certain extra objects besides data and database.

Procedure to schedule a dump operation using CRON:

 Log in with the super user.  Define a crontab entry that calls gpcrondump.

2012 EMC Proven Professional Knowledge Sharing 13

 Create a file in either the Greenplum superuser’s home directory or in $GPHOME/bin.  Give the email address for notification purposes.

Procedure to schedule a dump operation using CRON with Data Domain Boost:

 Complete the Data Domain Boost credentials setup.  Add –-ddboost option. For example:

EXAMPLES OF BACKUP AND RESTORE Your data is important; its security is the primary objective. The following cases show use case examples of backup and restore:

1. Example of performing a full backup and full restore of a database via NFS using Data Domain lz local compression only. 2. Example of backup and restore of Row/Table in case it has been deleted or corrupted. 3. Example of backup and restore of Schema in case it has been deleted or corrupted. 4. Example of backup and restore of Database in case of corruption or deletion.

Case 1: Performing a full backup and full restore of a database via NFS using Data Domain lz local compression only. In this case, each server has a mount point to the NFS share on the DD system. The database name is ―tpchData‖ which is backed up using gcrondump utility. The backup data is written from the DCA servers to the DD share. The restoration will be done using gpdbrestore. This is the recommended way to back up the databases in parallel from each server to a common NFS mount point to the DD system.

STEP-BY-STEP PROCEDURE with screenshots: 1. From DCA master server, run the following command for backup

2. Confirm backup is successful.

2012 EMC Proven Professional Knowledge Sharing 14

3. Delete the table ―region‖ from database.

2012 EMC Proven Professional Knowledge Sharing 15

4. Restore the table.

5. Confirm restore is successful.

2012 EMC Proven Professional Knowledge Sharing 16

6. Verify the restored data from master server.

Case 2: Backup and restore of Row or entire Table in case it has been deleted or corrupted. This case explains the steps to restore the table to the point in time of the backup due to accidental deletion of rows or an entire database table.

STEP-BY-STEP PROCEDURE with screenshots: 1. Start the backup and delete the tables from the ―tpchData‖ database.

2012 EMC Proven Professional Knowledge Sharing 17

20122. EMCRestore Proven the Professional data. Knowledge Sharing 18

3. Verify

Case 3: Backup and restore of Schema in case it has been deleted or corrupted. Schemas are a way to logically organize objects and data in a database. Schemas allow you to have more than one object (such as tables) with the same name in the database without conflict, as long as they are in different schemas. The newly created schemas are named ―public‖ by default.

STEP-BY-STEP PROCEDURE with screenshots: 1. Back up the public schema.

2. Delete the schema.

3. Restore the schema.

4. Verify.

2012 EMC Proven Professional Knowledge Sharing 19

Case 4: Example of database backup and restore in case of corruption or deletion.

STEP-BY-STEP PROCEDURE with screenshots: 1. Back up the database.

2. Drop the database.

2012 EMC Proven Professional Knowledge Sharing 20

3. Create and restore the database.

4. Vacuum analyze to improve performance.

5. Confirm the listing of database in Greenplum system.

2012 EMC Proven Professional Knowledge Sharing 21

TEST RESULTS Objective: The test objectives were to validate the success of data backup and restore. The various tests show the following:

 Backup speed with different compression options.  Backup window over 7 days with 5% increase of daily data.  Comparing Data Domain compression rates.  Comparing Data Domain deduplication ratio using increased data loads.  Observing the impact of running read queries and write queries on the database while running backups.

2012 EMC Proven Professional Knowledge Sharing 22

Test 1: A full backup of Greenplum was taken to the Data Domain using NFS with Greenplum backup compression on. The database size is 2.03TB. The following compression options were chosen:

lz gzfast gz

7.1000 Effective Backup Rates Local Compression Type 7.0500 TB/hr

7.0000

6.9500 GP Backup Compression On 6.9000

6.8500

6.8000

6.7500 lz gzfast gz

Test 2: The database size is 2.03TB which increases 5% daily over 7 days. The time taken for non-compressed backup on the first day was more due to the base backup. The throughput was also very less.

2012 EMC Proven Professional Knowledge Sharing 23

DCA Non Compressed DB 5% daily increase in 7 days 3.0000 2.721 2.591 2.467 2.5000 2.35 2.239 2.132 2.0310

2.0000

1.5000

DB size size DB TB in 1.0000

0.5000

0.0000 1 2 3 4 5 6 7 Daily Backup

DCA Non Compressed DB

Backup Duration

35 33

30 24 25 23 22 21

19 20 18

Minutes 15

0 1 2 3 4 5 6 7 Daily Backup

2012 EMC Proven Professional Knowledge Sharing 24

DCA Non-Compressed DB Compressed Backup to Data Domain 8.00 7.46 7.11 6.8 7.00 6.71 6.73 6.76

6.00

5.00

4.00 3.69 TB/hr 3.00

2.00

1.00

0.00 1 2 3 4 5 6 7 Daily Backup

Test 3: Due to the strength of Data Domain deduplication technology, the storage saving for each nightly backup results in significant time and cost savings.

Daily Storage Savings 45

38.9 40 37.8 37.2 37.3 35.6 35 30 28.40 24 24 25 22 23

GiB 19

20 18

savings vsFullDailysavings

10 Storage 5 0 1 2 3 4 5 6 7 Daily Backup Daily Increase in Data Stored on DD(GiB) Nightly Dedupe rate(X)

2012 EMC Proven Professional Knowledge Sharing 25

Test 4: The cumulative effect of this storage saving over a 7 day backup cycle results in real storage savings in backup infrastructure and facility costs.

Cumulative Storage Savings 6000 5326.86

5000 4460.25 Cumulative 4000 3572.94 Ingest Data(GiB) 3000 2807.82 Cumulative data stored on 2013.2 DD (GiB) 2000

1331.14 Cumulative BackupCumulative Data (GB) 1000 676.6 700.6 719.00 738.90 760.20 782.50 807.10

0 1 2 3 4 5 6 7 TB/hr

Test 5: Comparing deduplication rates from day 1 to day 7. The cumulative ingest data steadily increased, reaching 5.3TB on day 7. However, the cumulative data stored on DD was only 807GB.

2012 EMC Proven Professional Knowledge Sharing 26

Cumulative Dedupe Rate

7.00 6.60

6.00 5.70

5.00 4.70

4.00 3.80 Cumulative Dedupe Rate(X) 3.00 2.80 1.9 2.00 1.00

1.00 Storage Saving Saving Vs Storage CumulativeBackup Data 0.00 1 2 3 4 5 6 7

Test 6: A full backup of the database was performed with query load and a data load running against the DCA. The query load and data load was not performed at the same time. The intention was to demonstrate the impact while the backup job was running.

2012 EMC Proven Professional Knowledge Sharing 27

DCA Database Size Backup with Data Load 3.50 3.05 3.00

2.50 Before Data Ingest 2.03 2.00 After Data Ingest

1.50

Databasein TB Size 1.00

0.50

0.00 Daily Backup

DCA Backup Duration

18.00 17.93

17.80

17.60 17.56

No Load

17.40 Query Load

Minutes Ingest Load 17.20 17.20

17.00

16.80

2012 EMC Proven Professional Knowledge Sharing 28

Effective Backup Rate

7.15 7.10 7.08 7.05 7.00 6.94 6.95 No Load 6.90

hr Query Load \

TB 6.85 Ingest Load 6.79 6.80 6.75 6.70 6.65 6.60

Results 1. LZ compression gives the best performance. 2. The combination of gpcrondump compressed backups and Data Domain’s deduplication technology results in an average speed of 6.93TB/hour for backups from day 2 to day 7. 3. Average of 37.4x storage saving for each nightly backup. 4. After 7 days of running backup Data Domain deduplication storage system results in 6.6x storage saving. 5. Storage savings=85%. 6. Under concurrent query load, backup performance degradation is negligible. 7. Backup performance under a full data load, which is a write-intensive process, is only mildly affected, with no operational issues. 8. An effective backup rate for query load is not much less when compared to Ingest load. 9. The restore of a 2.72 TB database from a compressed backup was achieved in 28 minutes. 10. The average restore throughput of a Greenplum compressed backup was 5.79 TB/hr.

2012 EMC Proven Professional Knowledge Sharing 29

DISASTER RECOVERY SOLUTION FOR GREENPLUM DCA EMC provides an engineered solution that integrates DCA with VMAX to provide local and remote replicas of the data for disaster recovery and point-in-time snapshots. EMC Greenplum DCA maintains two copies of customer data and normally handles all data replication and protection tasks internally to the appliance. This generally achieves the highest level of performance for DW/BI tasks. In a SAN Mirror solution, the second copy of the data is moved to SAN-based storage. The DCA retains the primary copy of the data in order to maximize query performance. The SAN Mirror copy is updated with writes, but it is not read unless a primary database segment becomes inaccessible.

DCA DCA DCA

Primary Storage Primary Remote Storage DCA Mirror

Mirror Copy

Stand alone DCA SAN SAN Configuration

Remote Site EMC EMC VMAX Mirror DR Copy VMAX Copy PIT Copy

DCA vs. SAN Mirror DCA vs. SAN Mirror Remote Replication Figure 6: Solution Architecture for DR of Greenplum DCA

By keeping the mirrored copy on the SAN, customers can use storage facilities such as EMC TimeFinder® and SRDF® to create remote copies or point-in-time images for backup and DR.

Segment failure on local site A In DCA, six primary segment instances and six secondary instances run on each Segment Server. When a Greenplum Database™ system is deployed, there is the option to mirror the segment which allows the database to remain operational if a segment instance or segment host goes down. A periodic database checkpoint helps to ensure consistency between two instances. Figure 7 demonstrates the example with four segments only. Segment Server 1

2012 EMC Proven Professional Knowledge Sharing 30 primaries (Primary0 and Primary1) on local storage have associated mirrors (Mirror0 and Mirror1) on the SAN segment servers 2 and 3.

Figure 7: Segment Server with associated primary and secondary server

When the Segment Server fails, the primary instances on the failed Segment Server become unavailable. The active Master Server detects this and the associated mirror instances of the failed primary instances are promoted to the role of primaries. These mirror instances that have now been promoted to primaries have their storage serviced by the VMAX.

When the failed Segment Server is restored, the instances are recovered and the data is synchronized from the corresponding primary and continue servicing the workload.

Figure 8: Segment Server failure

In the case of a Segment Server failure on Site A, the rotating snapshot continues to run on Site B and the validated snapshot remains in place. The rotating snapshots keep operating on the other two sets of snapshot devices that are available, but continue to fail the database consistency check.

When re-synchronizing state, the remote database continues to report as inconsistent until the re-synchronization process is completed. When the failed Segment Server is fully recovered and back to its normal operating state, the rotating snapshots successfully complete the

2012 EMC Proven Professional Knowledge Sharing 31 consistency check and begin rotating around the three sets of snapshot devices, while again always keeping a validated snapshot in place.

Allocation and mounting of SAN devices on the DCA Following are the steps:

1. Create a storage group with one or more devices. symaccess -sid 1836 create -name sdw1 -type storage devs 034D,035D 2. Create a port group with one or more director or port combinations. symaccess -sid 1836 create -name vmaxsw1 -type port –dirport 5F:0,6F:0,7F:0,8F:0,9F:0,10F:0,11F:0,12F:0 3. Create an initiator group. symaccess -sid 1836 create -name sdw1 -type initiator -wwn 100000051e74804b 4. Update to include further WWNs of the FCoE cards contained in the server. symaccess -sid 1836 -type initiator add -name sdw1 -wwn 100000051e74804c 5. Create a masking view containing the storage group, port group, and initiator group created previously. symaccess -sid 1836 create view -name sdw1 -pg vmaxsw1 -ig sdw1 -sg sdw1

Moving Mirrors Following are the steps:

1. Remove the standby Master Server (smdw). gpinitstandby –r 2. Mount the SAN device on the standby Master Server. mount -o noatime,inode64,allocsize=16m /dev/emcpowera1 /data/master 3. Initialize and Activate standby Master Server. gpinitstandby -s smdw gpactivatestandby –f -d /data/master/gpseg-1/ 4. Delete the master data directory on the Master Server. rm –r /data/master/* 5. Mount the SAN device on the Master Server. mount -o noatime,inode64,allocsize=16m /dev/emcpowera1 /data/master 6. Initialize and Activate mdw as a standby. gpinitstandby -s mdw

2012 EMC Proven Professional Knowledge Sharing 32

gpactivatestandby –f -d /data/master/gpseg-1/ 7. Initialize the standby Master Server (smdw). gpinitstandby -s smdw

SAN mirror SRDF/S consistency group Following are the steps:

1. Create the consistency group on the source site. symcg create sanmirrdf -rdf_consistency -type rdf1 2. Add devices to the group. symcfg list -rdfg all symcg -cg sanmirrdf -sid 55 addall dev -rdfg 1 3. Perform initial full synchronization. symrdf -cg sanmirrdf establish 4. Enable consistency on the devices. symcg -cg sanmirrdf enable

SAN mirror rotating snapshots Following are the steps:

1. Check that the snapshot devices to be used contain the required VDEVs. 2. Mask snapshot devices to each Segment Server and Master Server. 3. Scan the SCSI bus. 4. Discover PowerPath. 5. Create the TimeFinder/Snap snapshot. Symsnap –sid 36 –f snapx create –svp snap 6. Activate the snapshot. Symsnap –sid 36 –f snapx activate 7. Mount the snapshot. 8. Check the consistency. 9. Unmount the VDEVs from DCA.

2012 EMC Proven Professional Knowledge Sharing 33

Figure 9: A High-level diagram of SAN Mirror mounts points on VMAX

Failover and Failback Following are steps for failover:

1. Terminate the rotating snap script. 2. Unmount the SAN devices. 3. Initiate the SRDF failover. 4. Scan the SCSI bus. 5. PowerPath discovery. 6. Mount the R2 devices. 7. Check the consistency of Greenplum database. 8. Synchronize the mirror and the primary segment servers. 9. Return the primary and mirror instances to their preferred roles. 10. If consistency check fails, restore operation from the last validated snapshot to the R2 device.

2012 EMC Proven Professional Knowledge Sharing 34

Figure 10: Failover to R2

Following are steps for failback:

At R2 site:

1. Stop database on the remote DCA. 2. Unmount the SAN devices on the remote DCA.

At R1 site:

1. Stop the database and unmount the R1 devices. 2. Run the symrdf unmount command. 3. Mount the R1 devices to the DCA servers. 4. Bring the database up in Admin mode only. 5. Switch roles. 6. Shut down the database. 7. Bring the database into full mode. 8. Check for database consistency. 9. Log in as gpadmin. 10. Start the database using gpstart –a. 11. Synchronize the mirror and the primary segment servers. 12. Return the primary and mirror instances to their preferred roles.

2012 EMC Proven Professional Knowledge Sharing 35

Time Analysis This analysis is based on 6 TB of database which 4.5 TB on disk.

Required Time Activity (minutes) One Snap operation 15 SRDF failover operation 48 SNAP restore operation 180 Failback prep 6 Failback R1 site 30 Total failback time 36

HIGH AVAILABILITY SOLUTION FOR GREENPLUM In a ―High Availability‖ (HA) configuration, a primary and standby DCA are both connected to SAN storage. SAN Mirror is used to mirror data between the primary DCA and the SAN storage, and TimeFinder/Snap is used to take periodic, consistent copies of the production data. Each snap is verified on the standby DCA.

If the primary DCA becomes unavailable, the most recent copy of the production data can be mounted by the standby DCA and processing can be quickly resumed. This also provides a mechanism to recover from data corruption on the primary DCA.

The master and standby master databases in each DCA normally are not mirrored. In the HA configuration, those databases must be moved from DCA internal disk to SAN disk to ensure recoverability.

2012 EMC Proven Professional Knowledge Sharing 36

Figure 11: SAN Mirror and High Availability Configuration

This solution increases the DCA’s internal capacity as the mirrored data is moved to the SAN, which frees the other half of the machine’s capacity.

WORKING An EMC Greenplum Database is actually a loosely-coupled array of individual, highly- customized PostgreSQL databases, working together to present a single database image. The master is the main entry point of the Database. The master host contains all the metadata required to distribute transactions across the system, and it does not hold any user data. This node is used to manage all the segment servers. During database deployment there is an option to mirror the segments. Primary segment instances replicate to mirror instances at the sub-file level. The periodic checkpoints and journal updates maintain the consistency between primary and mirrored data.

On segment failure, the remaining copy goes into change tracking mode and saves a list of changes made to its data. When the failed segment is resolved, either the incremental changes are copied or the full copy was executed to make it ―re-mirrored‖.

In DCA, six primary Segment Instances and six mirror Segment Instances run on each Segment Server. The segment uses internal DCA storage. Each mirror is on a separate segment host from its primary. To make a fully redundant process the master is also been mirrored by specialized log replication process which remains in warm standby mode.

2012 EMC Proven Professional Knowledge Sharing 37

Figure 12: High-level depiction of the Mirrored Configuration

In a SAN Mirror configuration, storage for the database master, the standby master, and all the mirrored segments is moved from storage to external SAN storage, advantageous with the use of TimeFinder and SRDF.

Figure 13: Conceptual depiction of a SAN Mirror Configuration

PREPARE DCAs FOR SAN MIRROR 1. Configuring SAN Mirror requires PowerPath on all servers. The AUTOSNAP utility is designed to use /dev/emcpower devices; it will not work properly if other devices are mounted. 2. The masters and standby masters execute TimeFinder SYMCLI commands, and were also used to create and assign VMAX devices in the lab. SYMCLI must be set to allow the same set of primary drives to appear in multiple groups, as AUTOSNAP requires that each set of snap devices be in a different disk group. As part of the SYMCLI install, the

2012 EMC Proven Professional Knowledge Sharing 38

following environment variable must be set in the root users .bash_profile or .bashrc to allow multiple virtual snaps of the same STD device.

SYMCLI_MULTI_VIRTUAL_SNAP=ENABLED export SYMCLI_MULTI_VIRTUAL_SNAP

3. Take the copy/backup of each master server and segment server that contains a Brocade 1020 Converged Network Adapter (CNA) for Fibre Channel over Ethernet (FCoE) connectivity to the SAN.

4. The DCA is configured with two EMC Connectrix® MP-8000B switches. Each switch contains eight Fibre ports for SAN connection. By default, FCoE is not configured on the switches, so it must be enabled individually on each of the MP-8000B’s internal Ethernet ports for SAN connectivity. Each Converged Network Adapter (CNA) will then show up as a separate initiator to be zoned on the SAN. To initially configure the MP-8000B switches with the scripts, simply execute it from the master:

CONFIGURE THE VMAX FOR SAN MIRROR ATTACHMENT 1. Create masters, standby volumes for DCA masters, and segment mirror databases: Each Segment Server supports two 2.7 TB database LUNs. In normal operation, one of these LUNs is used for a mirror database instance, and the other is used for a primary instance. In a SAN Mirror environment, the mirror LUN can be used for additional production data. The Master Server and Standby Server each support a single, 2.1 TB LUN. To implement SAN Mirror, LUNs of the same size should be created on the VMAX and made visible to the servers. Volumes can be created using the symconfigure CLI command.

2. Create snap volumes for standby DCA: Snap provide a flexible mechanism to make instant, consistent copies of the database volumes. As hosts write data, the original data is preserved in a ―SAVE‖ pool and pointers for the snap volumes are changed to point to this preserved data. Only changed data is tracked by TimeFinder/Snap. The command below creates the virtual devices once the SAVE pool has been created.

2012 EMC Proven Professional Knowledge Sharing 39

3. Create disk groups for SAN Mirror: Once the volumes are created, they should be added to Symmetrix disk groups. Each set of Snap devices must reside in a different disk group. The group should contain the same primary devices, but a different set of Snap devices.

4. Assign volumes to DCA and standby DCA: Once the volumes are created, they are assigned using the symaccess command. The new volumes should be put into storage groups, one per server. The CNA WWN that were captured earlier can be used to create the initiator groups. Volume 0755 is a production volume; volume 082D is a virtual device (TimeFinder/Snap). VDEVs should be assigned both to the production host and to the standby DCA.

CONFIGURE THE DCAs TO USE VMAX DISK Once the devices are created and assigned to the DCA, they can be recognized by the servers and you can build file systems and mount them on the primary DCA. The dca_setup script assumes that the primary internal RAID volumes are mounted on /data1 and /data2. Beneath those directories, the SAN Mirror volumes will be mounted on the ./san_mirror sub-directory. Follow the steps below to configure DCA to use VMAX disk:

1. Download the ―inq.linux‖ command from EMC. This command is used to probe the SCSI bus and report which devices it sees. 2. Download PYYAML module from http://pyyaml.org. The AUTOSNAP utility uses a configuration file written in the YAML markup language. It needs a Python module to parse the data. 3. Reboot the servers. 4. Verify all the disks are recognized by the servers.

2012 EMC Proven Professional Knowledge Sharing 40

5. Partition the VMAX disk to align the file systems with the VMAX cache for optimal performance.

6. Build XFS file system.

\ 7. Mount the power devices on /data1/san_mirror and /data2/san_mirror directories on each segment server, and /data directory of the master and standby Master Servers. 8. Run DCA setup for new configuration, i.e. when no data is present on the box.

For existing configurations move data with gpmovemirrors. Downtime is required to move master database to VMAX storage. 9. Copy AUTOSNAP utility to master and standby Master Servers. 10. Create YAML file for AUTOSNAP. 11. To make the Snaps visible to the standby DCA, an initial Snap must be taken from the primary DCA. This must be performed for each primary+vdev set. For instance, the lab configuration had three device groups with the same primary volumes, but different Snap vdevs. This process was executed three separate times to recognize each set of vdevs and to assign appropriate /dev/emcpower device names to them. At this point, the commands must be executed manually. Only after the initial snaps are taken and the devices are recognized by the Standby DCA can the process be automated.

12. Recognize the snaps on the standby DCA.

13. Source the greenplum_path.sh script to find the correct libraries.

14. Run the single snap command using AUTOSNAP in debug mode.

2012 EMC Proven Professional Knowledge Sharing 41

STARTING AUTOSNAP AUTOSNAP should be started from the command line when the database is running on the primary. The standby DCA should be up but the database should not be running. The AUTOSNAP configuration file specifies the device groups to be used, the standby DCA address, and how often snaps are to be taken.

MONITORING AUTOSNAP’s OPERATION AUTOSNAP logs all of its operation to /var/log/autosnap_.log on the primary Greenplum system. It provides no automatic email or SNMP notifications presently, so the log should be monitored for errors. It is also important to monitor AUTOSNAP’s record of completed snaps by running ./autosnap --listsnaps periodically. If snaps aren’t completing correctly, the listsnaps command will clearly show it.

If AUTOSNAP cannot execute a snap and verify, it will wait for the next time interval and retry. There are very few errors that will cause it to abort. If AUTOSNAP encounters an error or is killed by a user, it will not automatically restart. Therefore, it is also recommended to execute "ps -ef | grep autosnap" periodically to verify it is running.

FAILOVER AND FAILBACK Snaps play an important role in moving the processing to the standby DCA (Failover) from primary DCA. The advantage of using this facility is to note system status at a point-in-time without using much storage capacity. The snap is used later to create the similar configuration on standby DCA in case of a ―dead‖ primary. The AUTOSNAP utility is used to take the series of snapshots of the production Greenplum database and stores the results and the current status on the primary DCA. In the rotating pool, AUTOSNAP will use round-robin fashion to overwrite the oldest snap and retain the newest good snap. The actual work of creating and checking the snaps is done by standby DCA.

FAILOVER Procedure to move from primary to the standby DCA

1. Obtain the latest snap which is residing on the primary DCA.

2012 EMC Proven Professional Knowledge Sharing 42

Shown is DCA_DG_SNAP1 the most consistent snap which can be used for recovery. 2. If the primary DCA is in dead state, use the standby DCA to obtain the last good snap.

3. The utility has a record of the configuration and will automatically mount the disk group’s member in the correct places. Issue the autosnap command on the standby DCA using the internal mount switch.

Check and verify to ensure the snap was complete and successful.

2012 EMC Proven Professional Knowledge Sharing 43

If the check is successful, start the database; if the check fails, unmount the snap.

Snap listed previously in the *.yml should be mounted and checked. Once the consistent snap is found, start it using gpadmin user.

4. The consistent snap copy should be copied back to the standby DCA’s internal volumes. A FULL copy must be executed since an incremental copy will corrupt the database.

FAILBACK

1. Log in using gpadmin user and stop the database on both primary and standby DCA.

2. Unmount the primary and standby DCA SAN volume to execute TimeFinder restore from the most recent gold copy.

3. Obtain the latest snap from the primary DCA.

4. Set the contents of the primary DCA’s SAN volumes back to the state they were as of the last gold copy.

Monitor the progress.

2012 EMC Proven Professional Knowledge Sharing 44

Terminate the snap session when all devices are in a restored state.

5. Remount the volume on the primary DCA.

6. The database has two conflicting database images on disk. The corrupted image is on the primary DCA’s internal disk and the restored image is on the SAN Mirror disk. By default, the DCA will use the internal disk, which is the corrupted image. We will make the DCA read from the SAN Mirror copies. Start the database in maintenance mode to allow configuration changes without changing user data using gpadmin login.

7. Connect to the database in utility mode.

8. Verify that the primary segments—the segments whose preferred_role is "p"—are set to a role of ’m’ (mirror) and a status of ’d’ (down) and that the mirror segments are set to a role of ’p’ (primary), a mode of ’c’ (change tracking), and a status of ’u’ (up).

9. Swap the roles.

10. Then, set the primary volumes to a role of ’mirror’ and a status of ’down’.

11. Verify the configuration is correct. 12. Exit psql session.

13. Stop the database and remove it from maintenance mode.

14. Restart the database normally.

2012 EMC Proven Professional Knowledge Sharing 45

15. Copy the SAN Mirror volumes back to the DCA’s primary volume to ensure full redundancy and performance. Execute the full copy.

CASE STUDY Case studies are one of the most important parts of technical analysis. A case study is an intensive analysis of an individual unit stressing development factors in relation to context. Here we will study and analyze a case where Greenplum DCA made the difference.

Tagged is a social network for meeting new people. While other social networks are for staying in touch with people you already know, Tagged enables social discovery, group interests, and connecting via shared interests. Based in San Francisco, California, Tagged consistently ranked among the largest social networks.

Challenges: Growth of new connections and a rapid increase in data volumes were making the business complex. Meeting the expectation of generating a response in minutes under complex and targeted queries, making quicker decisions to drive out traffic, and ensuring high quality use of features are just a few of the key challenge.

“Our data warehouse took a Existing Product: Oracle-based data mart. few hours or even a day to Challenges process straightforward  To analyze complete datasets (database in PB), not questions from our business samples or summaries. analysts. In a business that  Perform simple and complex enquires for intraday changes as quickly as ours, analysis and response. that’s way too long,” says  Predictive and advanced analytics. Johann Schleier-Smith,  Keep pace with rapid growth of database and tagged co-founder and CTO. complexity. Solution Benefits  EMC Greenplum DCA  Ultra-fast analysis; scalable, advanced, and predictive.  EMC Professional  Users look much deeper. report.  50% increase in time by members.

2012 EMC Proven Professional Knowledge Sharing 46

Business Impact  The company can continuously load data, execute analytics in real time, and generate more detailed reports.  Collaborate with gaming partners to make the social games more fun, so that users want to keep playing.  More effectively crunch data from 70 million user accounts. Competitive Advantage  Members are giving more time to the site, increasing ad revenue.  Members are exposed to more ads and opportunities to spend Tagged Gold, virtual currency, which drives business forward.  Tagged is forming 10,000 users daily. How Business was Impacted?  The number of users grew from 20 million to 100 million.  3 PB of data was downloaded per month by users.  25 million visitors visit the site every month.

Converting Big Data into Big Business: Greenplum’s technology serves as a foundation for Tagged to unlock the business value inherent in ―big data‖. They now ship new code every day compared to their previous weekly release cycle.

“Faster time to market is essential because it helps us strengthen our competitive position. Greenplum delivers the insights we need to make better decisions that drive the business forward.”

JOHANN SCHLEIER-SMITH, CO-FOUNDER AND CTO

CONCLUSION ―If data is an asset, then big data is a big asset‖

EMC Greenplum DCA is the sweet spot for the Big Data industry. This powerful appliance delivers the fastest data loading and best price/performance in the industry. It is a complete solution which enables data protection, business continuity, and faster performance.

2012 EMC Proven Professional Knowledge Sharing 47

Some of the more compelling concluding points include:

 Operational simplicity through ease of integration and management.  Enable private cloud-virtualized DW and analytic infrastructure.  Start with ½ rack or 1 rack configurations, and scale to up to 24 racks. It grows to true Petabyte scale (2-5 PB compressed).  Integrated backup solution using Data Domain and Symmetrix VMAX.  SAN-ready to enable advanced storage features with site-to-site replication with RecoverPoint  Consolidates data marts to reduce TCO.

Figure 14 shows how Greenplum DCA compares with other competitive appliances.

Figure 14

2012 EMC Proven Professional Knowledge Sharing 48

REFERENCE 1. Configuring EMC Greenplum Data Computing Appliance using SAN Mirror and EMC Symmetrix VMAX for Disaster Recovery – Configuration guide by EMC Corporation. 2. Enhancing EMC Greenplum High Avaliability with SAN Mirror and EMC Symmetrix VMAX – Configuration guide by EMC Corporation. 3. Tagged Case study – http://www.greenplum.com/customers/tagged 4. Whitepaper: Backup and Recovery of the EMC Greenplum Data Computing Appliance using EMC Data Domain – An architecture Overview. 5. Greenplum Database 4.2 – Administration Guide, EMC Corporation. 6. Big Data Appliance July 23, TDWI by R Sathyanarayana.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED ―AS IS.‖ EMC CORPORATION MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

2012 EMC Proven Professional Knowledge Sharing 49