<<

EMC® Appliance

Appliance Version 2 / Version 3.3.0.0

Getting Started Guide

PART NUMBER: 302-004-091 REVISION: 01 Copyright © 2017 Dell Inc. or its subsidiaries. All rights reserved. Published June 2017 Dell believes the in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS-IS.“ DELL MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. USE, COPYING, AND DISTRIBUTION OF ANY DELL SOFTWARE DESCRIBED IN THIS PUBLICATION REQUIRES AN APPLICABLE SOFTWARE LICENSE.

Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be the property of their respective owners. Published in the USA.

EMC Corporation Hopkinton, Massachusetts 01748-9103 1-508-435-1000 In North America 1-866-464-7381 www.EMC.com Contents

Preface ...... 5 Welcome...... 5 About This Guide ...... 5 Document Conventions ...... 6 Text Conventions...... 6 Command Syntax Conventions ...... 7 Getting Support ...... 7 Product information ...... 7 Technical support ...... 7 Chapter 1: About the DCA ...... 9 About the DCA ...... 9 Two Appliance Versions ...... 9 DCA Module Types...... 9 Racking Guidelines ...... 15 Rack Types...... 15 Rack Density ...... 17 About the Network Configuration...... 18 DCA Modules and Master Servers...... 20 Master Servers ...... 21 GPDB Modules ...... 21 Accelerator Modules ...... 23 HD Compute Modules ...... 26 Hadoop Master and Worker Modules...... 27 GPDB Overview and Upgrade Tasks ...... 28 About GPDB...... 28 About the Master Servers ...... 29 About the Segment Hosts...... 30 GPDB Upgrade Tasks...... 31 Chapter 2: Supported Software Applications ...... 33 GPDB ...... 33 Pivotal Greenplum Command Center...... 33 Pivotal Hadoop...... 33 HAWQ ...... 34 Pivotal HD with EMC Isilon ...... 34 Pivotal Command Center...... 34 Supported software application versions ...... 34 Chapter 3: Preparing the Data Center Environment...... 37 Confirming Site Requirements...... 37 Floor Space Requirements ...... 37 DCA Rack Dimensions ...... 38 Connecting New Racks to the Power Supply ...... 41 Power Cord Specifications...... 41 Environmental Requirements...... 41 Air Quality Requirements...... 42 Optional Securing Brackets ...... 43 Anti-Tip Bracket ...... 44 Anti-Move Bracket ...... 44 Seismic Restraint Bracket...... 45

3 Cabinet Positioning ...... 46 Package Dimensions and Clearance ...... 47 Chapter 4: Planning for a Multiple Rack DCA ...... 49 Chapter 5: Gathering Site-Specific Information ...... 51 Site Requirements Checklist...... 51 Plan for Hadoop Networking...... 53 VLAN Overlay...... 53 Planning for Remote Support - ESRS and Dialhome ...... 54 Chapter 6: DCA Administration ...... 57 DCA utilities ...... 57 Description ...... 60 Options ...... 61 ConnectEMC Dial Home Capability ...... 64 Web-Based Management Options ...... 68 Pivotal Greenplum Command Center ...... 68 Pivotal Command Center ...... 68 GPDB Email and SNMP Alerting ...... 69 SNMP on the DCA ...... 69 DCA MIB information ...... 69 MIB Locations...... 69 MIB Contents...... 70 View MIB ...... 72 Integrate DCA MIB with environment...... 84 Change the SNMP community string...... 84 Set an SNMP Trap Sink...... 84 General Maintenance Tasks ...... 85 Routine Vacuum and Analyze ...... 85 Routine Reindexing...... 86 Managing GPDB Log Files ...... 86 Next Steps ...... 87 Chapter 7: Power Down the DCA ...... 89 Chapter 8: Next Steps ...... 95 Documentation Resources...... 95 Providing Access to GPDB ...... 95 Creating and Loading Data...... 95 Appendix A: Red Hat Enterprise End User License Agreement ...... 97 Glossary ...... 99

4 EMC DCA Getting Started Guide Preface

This guide is intended for EMC personnel, partners, database and system administrators, and customers to plan for installing a new Data Computing Appliance (DCA) into a data center. This guide provides an overview of the system, information on data center requirements, a checklist of items needed for software configuration, and links to relevant documentation for use in the next steps of deployment. This guide also contains an overview of the appliance configuration. Make sure that you verify that the requirements listed in this document are satisfied before performing a DCA installation. Welcome • About This Guide • Document Conventions • Getting Support •

Welcome Welcome to EMC and congratulations on your new acquisition of your EMC DCA product. To help you get started as a new EMC Customer, please visit our online support Welcome Center at http://www.emc.com/support/new-customers/index.htm. Here, you will find information to help you gain access to the tools and resources you need to successfully support your EMC products. In addition, you will be introduced to our Online Support site (Support.EMC.com) which is your single destination for support and online access to numerous resources including product-specific support information and downloads, software license activation, service request creation and management, self-help tools, and a single view of your entire EMC installed base. You can also access our lively Support Community and quickly connect with an EMC technical support specialist via Live Chat.

About This Guide This guide assumes knowledge of Linux/UNIX system administration, database management systems, database administration, and structured query language (SQL). This guide contains the following chapters and appendices: Chapter 1, “About the DCA” explains the architecture, components, and configuration • of Pivotal™ Greenplum Database® (GPDB) on the DCA. Chapter 2, “Supported Software Applications” describes the optional software • applications supported by the DCA. Chapter 3, “Preparing the Data Center Environment” describes site requirements for • the DCA, securing brackets, cabinet positioning, and package dimensions and clearance. Chapter 4, “Planning for a Multiple Rack DCA” contains information required to plan • for a multiple rack DCA. Chapter 5, “Gathering Site-Specific Information” contains a site requirements • checklist, a plan for Hadoop networking, and information on remote support. Chapter 6, “DCA Administration” describes the general database maintenance tasks • and the tools available to diagnose, monitor, and troubleshoot a GPDB system running on the Data Computing Appliance. Chapter 7, “Power Down the DCA” explains how to power down the DCA safely. • Chapter 8, “Next Steps” explains the next steps for implementing your data • warehouse requirements in GPDB.

5 Document Conventions

“Glossary” defines DCA components and terminology. •

Document Conventions The following conventions are used throughout the DCA documentation to help you identify certain types of information. Text Conventions • Command Syntax Conventions •

Text Conventions Preface.1 Text Conventions

Text Convention Usage Examples bold Button, menu, tab, page, and field Click Cancel to exit the page without names in GUI applications saving your changes.

italics New terms where they are defined The master instance is the postgres process that accepts client Database objects, such as schema, connections. table, or columns names Catalog information for GPDB resides in the pg_catalog schema. monospace File names and path names Edit the .conf file. Programs and Use gpstart to start GPDB. Command names and syntax Parameter names

monospace italics Variable information within file paths /home/gpadmin/config_file and file names COPY tablename FROM Variable information within command 'filename' syntax

monospace bold Used to call attention to a particular Change the host name, port, and part of a command, parameter, or database name in the JDBC connection code snippet. URL:

jdbc:postgresql://host:5432/m ydb UPPERCASE Environment variables Make sure that the Java /bin directory is in your . SQL commands $PATH Keyboard keys SELECT * FROM my_table; Press CTRL+C to escape.

6 EMC DCA Getting Started Guide Getting Support

Command Syntax Conventions Table Preface.2 Command Syntax Conventions

Text Convention Usage Examples

{ } Within command syntax, curly braces FROM { 'filename' | STDIN } group related command options. Do not type the curly braces.

[ ] Within command syntax, square TRUNCATE [ TABLE ] name brackets denote optional arguments. Do not type the brackets.

... Within command syntax, an ellipsis DROP TABLE name [, ...] denotes repetition of a command, variable, or option. Do not type the ellipsis.

| Within command syntax, the pipe VACUUM [ FULL | FREEZE ] symbol denotes an “OR” relationship. Do not type the pipe symbol.

$ system_command Denotes a command prompt; do not $ createdb mydatabase type the prompt symbol. $ and # # denote terminal command prompts. # chown gpadmin -R /datadir root_system_command => and =# denote GPDB interactive => SELECT * FROM mytable; => gpdb_command program command prompts ( psql =# SELECT * FROM or , for example). =# su_gpdb_command gpssh pg_database;

Getting Support EMC support, product, and licensing information can be obtained as follows.

Product information For DCA product-specific documentation, release notes, or software updates, go to the EMC Online Support site at http://support.emc.com, click Support By Product, and search for Data Computing Appliance.

Technical support For technical support, go to http://support.emc.com. The Support page includes several support options, including an option to request service. Note that to open a service request, you must have a valid support agreement. Please contact your EMC sales representative for details about obtaining a valid support agreement or with questions about your account.

7 Getting Support

8 EMC DCA Getting Started Guide About the DCA

1. About the DCA

The Data Computing Appliance is a self-contained solution that integrates all of the database software, servers, and switches necessary to perform analytics. The DCA is a turn-key, easily installed data warehouse solution that provides extreme query and loading performance for analyzing large data sets. The DCA integrates GPDB, data loading, and Hadoop software with compute, storage, and network components. The DCA is delivered racked and ready for immediate data loading and query . This chapter includes the following sections: About the DCA • DCA Modules and Master Servers • GPDB Overview and Upgrade Tasks •

About the DCA This section explains the hardware components and specifications of the DCA. Two Appliance Versions • DCA Module Types • Rack Types • Rack Density •

Two Appliance Versions The DCA 3.3.0.0 software supports all DCAv2 and DCAv3 hardware. A DCAv3 System rack has two Python master servers and one to four GPDB modules, with each module comprised of four Hydra 24 segment servers. Each System rack also has an Arista administration switch and two Arista interconnect switches. Both server types have 256GB of memory and 1.8TB drives; the Python has six drives and the Hydra 24 has 24 drives. Aggregation and Expansion racks use subsets of the System rack components. DCAv2 System, Aggregation, and Expansion racks have the standard 2.x configurations of servers, switches, drives, and memory. A DCAv2 appliance can have GPDB modules, Data Integration Accelerator (DIA) modules, and Hadoop modules.

Note: The DCA 3.3.0.0 software release provides separate sets of documentation for the DCAv3 and DCAv2 appliances. Both sets are available at http://support.emc.com.

DCA Module Types The DCA is built from required switches, two master nodes for cluster management, and server increments called modules. DCA modules consist of either two or four servers. EMC-supported servers for the DCA are named Dragon 12, Dragon 24, or Kylin. This helps customers and EMC Support to easily identify servers. Read this section for server types that make up the three available modules: GPDB Module • Data Integration Accelerator (DIA) Modules •

9 About the DCA

Hadoop Modules •

GPDB Modules Server Types and Specifications Table 1.1 lists the server types and specifications for the GPDB modules.

Table 1.1 GPDB Module Specifications

GPDB module type Server quantities / Drive Types / Memory Usage

GPDB Standard Module This module is comprised of four Dragon 24 GPDB (Introduced in DCA version servers. 2.0.0.0) • Disks - Twenty Four 900GB drives per server • Memory - 64GB per server

GPDB Compute Module This module is comprised of four Dragon 24 GPDB (Introduced in DCA version servers. 2.0.0.0) • Disks - Twenty Four 300GB drives per server • Memory - 64GB per server

GPDB Hi-Memory Module This module is comprised of four Dragon 24 GPDB (Introduced in DCA version servers. 2.0.2.0) • Disks - Twenty Four 300GB drives per server • Memory - 256GB per server

DIA Modules Server Types and Specifications Table 1.2 lists the server types and specifications for the DIA modules.

Table 1.2 DIA Module Specifications

Type Server quantities / Drive Types / Memory Usage

DIA-Kylin 300GB Disk Module This module is comprised of two Kylin Business Introduced in DCA version servers. Intelligence Tools 2.0.0.0 • Disks - Six 300GB drives per server • Memory - 64GB per server

DIA 3TB Disk Module This module is comprised of two Dragon 12 Business Introduced in DCA version servers. Intelligence Tools 2.0.2.0 • Disks - Twelve 3TB drives per server • Memory - 64GB per server

DIA Hi-Memory Module with This module is comprised of two Dragon 24 Business 24 HDDs servers. Intelligence Tools Introduced in DCA version • Disks - Twenty Four 300GB drives per 2.0.2.0 server • Memory - 256GB per server

DIA-Kylin Hi-Memory Module This module is comprised of two Kylin Business Introduced in DCA version servers: Intelligence Tools 2.1.0.0 • Disks - Six 300GB drives per server • Memory - 256GB per server

10 EMC DCA Getting Started Guide About the DCA

Hadoop Modules Server Types and Specifications Table 1.3 lists the server types and specifications for the Hadoop modules.

Table 1.3 Hadoop Module Specifications

Type Server quantities / Drive Types / Memory Usage

Hadoop (HD) Module This module is comprised of four Dragon Pivotal Hadoop (master or worker) 12 servers: • Disks - Twelve 3TB drives per server • Memory - 64GB per server

Hadoop-Compute (HDC) Module This module is comprised of two Kylin Pivotal Hadoop servers. with Isilon • Disks - Six 300GB drives per server storage • Memory - 64GB per server

Hadoop Dragon 12 This module is comprised of four Dragon Pivotal Hadoop Hi-Memory Module 12 servers. and Pivotal Introduced in DCA version 2.1.0.0 • Disks - Twelve 3TB drives per server HAWQ (worker) • Memory - 256GB per server

Hadoop Dragon 12 Large Disk This module is comprised of four Dragon Pivotal Hadoop Module 12 servers. and Pivotal introduced in DCA version 2.1.0.0 • Disks - Twelve 6TB drives per server HAWQ (worker) • Memory - 256GB per server

Supported Configurations The following DCA configurations are supported: GPDB DCA (can be GPDB-only or a mix of GPDB and other types of servers): • Requires a minimum of 1 GPDB module in the System Rack occupying the lowest • rack position A GPDB module is comprised of x4 Intel 2U 24-drive servers • Maximum GPDB modules per rack: x4 modules (x16 24 drive servers) • Hi-memory servers (servers with 256GB memory) allow the following number of • server modules per rack: – Maximum of 3 server modules per rack with 4 single-phase power drops – Maximum of 4 server modules per rack with 6 single-phase power drops Hadoop-only DCA (applies to DCA version 2.0.1.0 and later): • Minimum Hadoop configuration: 1 module + 1 module • hdw hdm A Hadoop Worker module ( ) is comprised of x4 2U Intel 12-drive servers • hdw A Hadoop Master module ( ) is comprised of x4 2U Intel 12-drive servers • hdm Hadoop Compute configuration: • Four HDC modules •

11 About the DCA

Minimum GPDB configuration The minimum GPDB-based DCA is comprised of a single GPDB module. The maximum GPDB configuration is 48 modules occupying 12 racks.

Figure 1.1 Minimum GPDB configuration

12 EMC DCA Getting Started Guide About the DCA

Hadoop Compute Configuration The minimum Hadoop Compute configuration requires 4 Hadoop Compute modules.

Figure 1.2 Hadoop Compute configuration

13 About the DCA

Minimum Hadoop-only configuration (Applies to DCA version 2.0.1.0 and later) The minimum Hadoop-only-based DCA is comprised of two modules: a single Hadoop Master module and a single Hadoop Worker module.

Figure 1.3 Minimum Hadoop-only configuration

14 EMC DCA Getting Started Guide About the DCA

Racking Guidelines GPDB Compute, Standard, or High Memory modules must not occupy the same DCA. • GPDB Hi-Mem servers are limited to four modules or 12 servers per rack. • The minimum Hadoop configuration must include two Hadoop modules, one serving • as the Hadoop Master module (hdm) and a second serving as the Hadoop Worker (data) module (hdw). For Hadoop Compute with Isilon the minimum requirements are eight Kylins (4 x2 Hadoop Compute modules). The 2nd rack (if present) is always an Aggregation rack. HD-C and DIA-Kylin are limited • to a maximum of 10 modules or 20 servers in rack 1 (system rack) and rack 2 (aggregation rack). Racks 3 through 11 (if present) are Expansion racks. HD-C and DIA-Kylin are limited to • a maximum of 11 modules or 22 servers in expansion racks. Any rack containing even one 100-585-055-01 is limited to thirty rack units for • servers. Switches remain in the standard locations. Racks with High Memory servers should not exceed 30U.

Expansion gation m e te Sys Aggr

Figure 1.4 Maximum Configuration: 11 Racks

Rack Types There are three rack types: system rack, aggregation rack, and expansion rack. The system rack contains an admin switch, two interconnect switches, two master • servers and a system tray. The aggregation rack contains an admin switch, two interconnect switches and two • aggregation switches. It does not contain a system tray or master servers. Expansion racks contain an admin switch and two interconnect switches. They do not • contain a system tray, master servers, or aggregation switches.

15 About the DCA

Figure 1.5 Multiple Rack Types Note: Because of power considerations, racks that have servers with 256GB memory can only have 3 GPDB modules or 3 Hadoop Modules.

16 EMC DCA Getting Started Guide About the DCA

Rack Density Rack density refers to the number of servers possible in a rack. This number is dictated first by the physical space in a rack and next by how much power is delivered to the rack. EMC uses racks with 40 rack units of usable space (a rack unit is 44.45mm or 1.75 inches), and 9600 watts of usable input power. Each rack pulls a max of 1250W of static hardware (switches, master nodes, etc) leaving 8350W for servers. 2U servers (servers that occupy two rack units) with 64GB of RAM use at most 520W. Servers with 256GB of RAM use at most 600W. 1U servers with 64GB of RAM use at most 430W. Therefore, a 40U rack with 8350W of usable input power can fit the following: 16x2U servers each with 64GB RAM (standard memory, GPBD/Hadoop nodes) • 22x1U servers each with 64GB RAM, also known as the Dense Rack (master nodes, • DIA nodes, or HD+Isilon) 12x2U servers each with 256GB RAM (High memory nodes for DIA or GPDB) • 18x1U servers each with 256GB RAM (High memory nodes for DIA or HDC) • Or any combination thereof. The following diagram shows where servers can be placed in racks. 2U servers should be racked before 1U servers.

Figure 1.6 DCA rack density

17 About the DCA

About the Network Configuration Figure 1.7 and Figure 1.8 show examples of how the network is configured in a DCA. The GPDB interconnect and administration networks are configured on a private LAN. Outside access to the GPDB and the DCA systems goes through the master servers.

Figure 1.7 GPDB network configuration

18 EMC DCA Getting Started Guide About the DCA

.

Figure 1.8 Hadoop network configuration

About the GPDB Interconnect Networks The interconnect is the network layer of the GPDB. When a user connects to a database and issues a query, processes are created on each of the segments to handle the query. The interconnect refers to the inter-process communication between the segments as well as the communication’s network infrastructure.

19 DCA Modules and Master Servers

To maximize throughput, the interconnect activity is load-balanced over two interconnect switches. To ensure redundancy, the interconnect switches are configured with the Multi-Chassis Link Aggregation (MLAG) technology. This ensures that the loss of a switch, port, or cable does not affect the availability of the GPDB.

If a cluster contains multiple racks, Aggregation switches are used to connect the interconnect network in each rack. A multiple-rack DCA has two Aggregation switches. Cabling runs from all Interconnect 1 switches to one Aggregation switch and from all Interconnect 2 switches to another Aggregation switch.

About the DCA Administration Network The administration network is used for system management and GPDB administration and does not interfere with the network traffic related to database processing. Each Master and Segment server has one administration/BMC network interface.

About BMC The baseboard management controller (BMC) is a built-in interface included in most servers that provides out-of-band system management facilities. The controller has its own processor, memory, battery, network connection, and access to the system . Key features include power management, virtual media access, and remote console capabilities, all available through a supported web browser. BMC gives system administrators the ability to manage a machine as if they were sitting at the local console.

DCA Modules and Master Servers The DCA is comprised of server modules. A module consists of either two, three or four servers, and the software configuration loaded onto each server determines the module type. The most basic DCA is configured with a pair of master servers (Primary and Standby) and at least one GPDB module. This section includes the following topics: Master Servers • GPDB Modules • Data Integration Accelerator Modules • HD Compute Modules • Hadoop Master and Worker Modules • About the Master Servers • About the Segment Hosts •

20 EMC DCA Getting Started Guide DCA Modules and Master Servers

Master Servers Primary and Standby Master servers are the entry point for external connections to the GPDB. The Standby Master server is a warm standby. If the Primary Master server fails, operations failover to the Standby Master server.

Root - sda2 – 32GB

Swap – sdc1 – 32GB

Crash/Cores – sdd1 – 32GB

Home –sdb1 –100GB

Data – sde – 918GB

Adapter 0

Figure 1.9 Master Server Specifications Hardware Specifications Quantity

Processor Intel X2660 2.2 GHz (8 core) 2

Memory DDR3 64GB

Dual-port Converged Network 2 x 10 Gb/s 1 Adapter

Quad-port Network Adapter 4 x 1 Gb/s 1

RAID controller Dual channel 6 Gb/s SAS 1

Hard Disks 300GB 10 K RPM SAS 6 (One RAID5 volume of 4+1 with one hot spare) Master Servers in GPDB Compute and GPDB Standard systems use the same type and number of drives.

GPDB Modules A GPDB module is a of servers that host the GPDB. GPDB is always the first module in a DCA. There are three types of GPDB servers: Greenplum Database Compute Module—Highly scalable data-analytics appliance • module that integrates database, computing, storage, and network into an enterprise-class system. Greenplum Database Standard Module—3x the capacity as the GPDB Compute • module. For analyzing extremely large data sets at the same performance level as the compute module.

21 DCA Modules and Master Servers

Greenplum Database High Memory Module—4x the memory capacity per as the GPDB • Compute Module. For applications requiring a larger amount of memory.

Warning: GPDB Compute servers, Standard servers, and GPDB Hi-Memory servers must not occupy the same DCA. Once you choose one kind of GPDB server, you cannot mix it with other GPDB servers.

Each GPDB server serves eight GPDB primary segment instances, and eight mirror segment instances.

Figure 1.10 GPDB Modules

GPDB Server Specifications Hardware Specifications Quantity

Processor Intel X2660 2.2 GHz (8 core) 2

Memory DDR3 64GB or 256GB

Dual-port Converged Network 2 x 10 Gb/s 1 Adapter

Quad-port Network Adapter 4 x 1 Gb/s 1

RAID controller Dual channel 6 Gb/s SAS 1

Hard Disks Compute: 300GB 10K RPM SAS 24 Standard: 900GB 10K RPM SAS

22 EMC DCA Getting Started Guide DCA Modules and Master Servers

Data Integration Accelerator Modules DIA modules are high capacity loading servers. DIA modules are pre-configured with the gpfdist and gpload software to allow data to be loaded easily into GPDB modules.

Figure 1.11 Kylin High Memory DIA module Table 1.4 Kylin DIA Server Specifications Hardware Specifications Quantity

Processor Intel X2660 2.2 GHz (8 core) 2

Memory DDR3 64GB or 256GB

Dual-port Converged Network 2 x 10 Gb/s 1 Adapter

Quad-port Network Adapter 4 x 1 Gb/s 1

RAID controller Dual channel 6 Gb/s SAS 1

Hard Disks DIA: 300GB 10 K RPM SAS 6 Note: In a Hadoop Compute node, the 6 hard disks contain the and DCA software.

23 DCA Modules and Master Servers

Figure 1.12 Dragon 12 DIA module Table 1.5 Dragon 12 DIA Server Specifications Hardware Specifications Quantity

Processor Intel X2660 2.2 GHz (8 core) 2

Memory DDR3 64GB

Dual-port Converged Network 2 x 10 Gb/s 1 Adapter

Quad-port Network Adapter 4 x 1 Gb/s 1

RAID controller Dual channel 6 Gb/s SAS 1

Hard Disks DIA: 3TB 7.2K RPM SATA 12

24 EMC DCA Getting Started Guide DCA Modules and Master Servers

Figure 1.13 Dragon 24 High Memory DIA Module

Table 1.6 Dragon 24 High Memory DIA Server Specifications Hardware Specifications Quantity

Processor Intel X2660 2.2 GHz (8 core) 2

Memory DDR3 256GB

Dual-port Converged Network 2 x 10 Gb/s 1 Adapter

Quad-port Network Adapter 4 x 1 Gb/s 1

RAID controller Dual channel 6 Gb/s SAS 2

Hard Disks DIA: 300GB 10 K RPM SAS 24

25 DCA Modules and Master Servers

HD Compute Modules Hadoop Compute modules are designed to conduct Hadoop computing while data may be stored on optional EMC® Isilon® storage.

Figure 1.14 Hadoop Compute Module Table 1.7 Hadoop Compute Server Specifications Hardware Specifications Quantity

Processor Intel X2660 2.2 GHz (8 core) 2

Memory DDR3 64GB

Dual-port Converged Network 2 x 10 Gb/s 1 Adapter

Quad-port Network Adapter 4 x 1 Gb/s 1

RAID controller Dual channel 6 Gb/s SAS 1

Hard Disks DIA: 300GB 10 K RPM SAS 6

26 EMC DCA Getting Started Guide DCA Modules and Master Servers

Hadoop Master and Worker Modules Pivotal HD Enterprise is an enterprise-capable, commercially supported distribution of packages targeted to traditional Hadoop deployments. The Hadoop Master and Worker modules in the DCA are configured with Pivotal HD Enterprise and are ready for high-performance, unstructured data queries.

DCA 3.3.0.0 Hadoop Configuration In DCA version 3.3.0.0, Hadoop modules run many services including: HDFS • YARN • ZooKeeper • HBase • Hive • HAWQ • PXF • For a complete list of services, refer to the Pivotal HD Installation and Administrator Guide on http://docs.pivotal.io/.

Hadoop Master server Hadoop Worker server

Figure 1.15 DCA version 3.3.0.0 Hadoop Master and Worker High Memory modules Table 1.8 Hadoop Server Specifications Hardware Specifications Quantity

Processor Intel X2660 2.2 GHz (8 core) 2

Memory DDR3 64GB or 256GB

27 GPDB Overview and Upgrade Tasks

Table 1.8 Hadoop Server Specifications Hardware Specifications Quantity

Dual-port Converged Network 2 x 10 Gb/s 1 Adapter

Quad-port Network Adapter 4 x 1 Gb/s 1

RAID controller Dual channel 6 Gb/s SAS 1

Hard Disks 3TB 7.2K RPM SATA 12

Note: For information on Hadoop configurations for DCA software releases earlier than 3.3.0.0, refer to the appropriate DCAv2 Data Computing Appliance Getting Started Guide available on http://support.emc.com.

GPDB Overview and Upgrade Tasks

About GPDB GPDB is a massively parallel processing (MPP) database management system (DBMS). GPDB 4.2 and later uses MPP as the backbone to its database architecture. MPP refers to a distributed system comprised of two or more individual servers that carry out an operation in parallel. Each server has its own processors, memory, operating system, and storage. All servers communicate with each other over a common network. In this instance a single database system can effectively use the combined computational performance of all individual MPP servers to provide a powerful, scalable database system. GPDB uses this high-performance system architecture to distribute the load of multi-terabyte data warehouses, and is able to use all of a system’s resources in parallel to process a query. GPDB is based on PostgreSQL 8.2.14, and in most cases is very similar to PostgreSQL with regards to SQL support, features, configuration options, and end-user functionality. Database users interact with GPDB as they would a regular PostgreSQL DBMS. GPDB handles the storage and processing of large amounts of data by distributing the load across several servers, or hosts. The master is the entry point to the GPDB system. It is the database instance that clients use to connect and submit SQL statements. DCA comes with two master hosts: a primary master and a standby master. The master coordinates the work across the segments (the other database instances in the system), which handle data processing and storage. DCA comes with a configurable number of segment hosts. Each segment host serves six primary and six mirror GPDB segment instances. The segments communicate with each other and with the master over the interconnect, which is the network layer of GPDB. The DCA interconnect is configured on a private LAN and utilizes two high-speed network switches, offering each segment host 20GB of non-blocking duplex bandwidth. The GPDB primary and mirror segments use different interconnect switches to provide redundancy in case of a single switch failure.

28 EMC DCA Getting Started Guide GPDB Overview and Upgrade Tasks

In addition to the interconnect switches, DCA comes with an additional administration switch. Each master and segment server has a dedicated interface for remote system administration. This controller has its own processor, memory, battery, and network connection. This allows administrators to access the individual DCA servers as if they were at the local console (terminal).

Figure 1.16 High-Level GPDB Architecture

About the Master Servers The master is the entry point to the GPDB system from the public LAN. Systems that use the automated Master server failover have a virtual IP configured, and client tools should point to this IP. The database process accepts client connections and processes the SQL commands issued by users. Users connect to GPDB through the Master server using PostgreSQL-compatible client programs such as psql or ODBC. The Master server maintains the system catalog, a set of system tables that contain about the GPDB system itself. However, the Master server does not contain any user data; data resides only on the segments. The Master server authenticates client connections, processes and plans the incoming SQL commands, distributes the work load between the segments, coordinates the results returned by each of the segments, and presents the final results to the client program.

29 GPDB Overview and Upgrade Tasks

Master Redundancy—The Standby Master DCA also has a Standby Master server to serve as a in case the Primary master becomes inoperable. The Standby Master can be set up to promote itself automatically to the role of acting Primary Master if the original Primary Master (mdw) fails. Automatic master server failover is enabled by default. Primary and Standby Master servers are kept in sync by use of a transaction log replication process that runs on the Standby Master. If the Primary Master fails, the log replication process shuts down, and the Standby Master can be activated in place of the Primary Master. When the Standby Master uses the replicated logs to reconstruct the of the Primary Master server at the time of its last successfully-committed transaction.

About the Segment Hosts In GPDB, the database data is stored in the segments, where the majority of query processing occurs. User-defined tables and their indexes are distributed across the available number of segments in the GPDB system, each segment containing a distinct portion of the data. Segment instances are the database server processes that serve segments. Users and administrators do not interact directly with the segments in a GPDB system, but do so through the Master server.

Data Redundancy—Mirror Segments GPDB provides data redundancy by deploying mirror segments. The mirror segments allow database queries to fail over to a backup segment if the primary segment becomes unavailable. A mirror segment always resides on a different server than its corresponding primary segment. A GPDB system can remain operational if a segment server, network interface, or interconnect switch goes down as long as all the data is available on the remaining active segments. During database operations, only the primary segment is active. Changes to a primary segment are copied over to its mirror using a file block replication process. Unless a failure occurs on the primary segment, no live segment instance runs on the mirror host, only the replication process.

Figure 1.17 Data Mirroring in GPDB

If a segment fails, the file replication process stops and the mirror segment automatically starts as the active segment instance. All database operations then continue using the mirror. While the mirror is active, it logs all transactional changes made to the database. When the failed segment is ready to be brought back online, administrators initiate a recovery process that returns it to operation.

30 EMC DCA Getting Started Guide GPDB Overview and Upgrade Tasks

GPDB Upgrade Tasks Note: Upgrading to DCA version 2.0.3.0 and later requires GPDB 4.3.x and above. Customers who wish to remain on earlier versions of GPDB cannot upgrade to version 2.0.3.0 and later.

Starting with the GPDB 4.3.4.1 release, DCA customers who have registered with Pivotal Support can download GPDB releases from the Pivotal web site (https://network.pivotal.io). See the Pivotal Greenplum Database 4.3.4.1 Release Notes or later versions for detailed instructions on upgrading GPDB software on the DCA. The minimum recommended upgrade path is from GPDB version 4.2.x.x. If you have an earlier major version of the database, you must first upgrade to version 4.2.x.x. Before you start the upgrade, perform the checks recommended in the release notes and resolve any issues with the environment. If you have any questions, go to https://support.emc.com.

31 GPDB Overview and Upgrade Tasks

32 EMC DCA Getting Started Guide GPDB

2. Supported Software Applications

The DCA modular architecture enable customers to add new modules to support optional software products. The following optional software applications are supported by the DCA: GPDB • Pivotal Greenplum Command Center • Pivotal Hadoop • HAWQ • Pivotal HD with EMC Isilon • Pivotal Command Center •

GPDB GPDB is a massively parallel processing (MPP) database server that supports next generation data warehousing and large-scale analytics processing. By automatically partitioning data and running parallel queries, it allows a cluster of servers to operate as a single database supercomputer performing tens or hundreds times faster than a traditional database. It supports SQL, MapReduce parallel processing, and data volumes ranging from hundreds of , to hundreds of terabytes.

Pivotal Greenplum Command Center Pivotal Greenplum Command Center (GPCC) is a management tool for Pivotal's GPDB. GPCC monitors system performance metrics, system health, and also provides administrators the ability to perform management tasks such as start, stop, and recovery of systems for GPDB. GPCC is an interactive graphical web application that can be installed on a web server on the master host, and used to view and interact with the collected system data from the GPDB and optionally from the DCA.

Pivotal Hadoop Pivotal HD Enterprise is an enterprise-capable, commercially supported distribution of Apache Hadoop 2.2 packages targeted to traditional Hadoop deployments. Pivotal HD Enterprise enables you to take advantage of big data analytics without the overhead and complexity of a project built from scratch. Pivotal HD Enterprise is Apache Hadoop that allows users to write distributed processing applications for large data sets across a cluster of commodity servers using a simple programming model. This framework automatically parallelizes Map Reduce jobs to handle data at scale, thereby eliminating the need for developers to write scalable and parallel .

33 HAWQ

HAWQ HAWQ extends the functionality of Pivotal Hadoop (HD) Enterprise, adding rich, proven parallel SQL processing facilities. These SQL processing facilities enhance productivity, rendering Hadoop queries faster than any Hadoop-based query interface on the market. HAWQ enables for a variety of Hadoop-based data formats using the Pivotal Extension Framework (PXF), without duplicating or converting source files. HAWQ is a parallel SQL query engine with the scalability and convenience of Hadoop. Using HAWQ functionality, you can interact with petabyte range data sets. HAWQ provides users with a complete, standards-compliant SQL interface. HAWQ consistently performs tens to hundreds of times faster than all Hadoop query engines in the market.

Pivotal HD with EMC Isilon Combining Pivotal HD with the scalability and capacity of Isilon storage enables you to take advantage of big data analytics quickly and simply. Pivotal HD is enterprise-ready Apache Hadoop that allows users to write distributed processing applications for large data sets across an Isilon cluster, using a simple programming model. This framework automatically parallelizes Map Reduce jobs to handle data scale, thereby eliminating the need for developers to write scalable and parallel algorithms. Configuring Pivotal HD on Isilon storage clusters exponentially increases the amount of data you can use to create business insights. The PHD - Isilon solution allows PHD compute services such as MapReduce and HBase to run on DCA "hdc" compute nodes, but moves the HDFS file storage to the Isilon cluster. This configuration makes it easier to run a large Hadoop cluster by simplifying data import and export and by separating disk management from the compute cluster. Use of Isilon brings high availability of namenode services to Hadoop by eliminating the HDFS namenode as a single point of failure. Isilon's remote replication and snapshot capabilities bring additional enterprise to Hadoop.

Pivotal Command Center Pivotal Command Center (PCC) is a multi-tier graphical web application that allows you to configure, deploy, monitor, and manage your Pivotal HD clusters. PCC enables administrators to view aggregated and non-aggregated system metrics data as well as Hadoop specific metrics for a selected cluster. Users are also able to analyze and gain insights on their cluster by drilling down into specific services or categories of nodes. Metrics are provided on how a cluster is performing in real-time and trending over time.

Supported software application versions The following table lists supported software application versions in DCA 3.3.0.0.

Table 2.1 Supported Software Application Versions

Supported Software Versions Supported

GPDB 4.3.5.2

GPCC 1.3.0.1

HDP 2.5

34 EMC DCA Getting Started Guide Supported software application versions

Table 2.1 Supported Software Application Versions

Supported Software Versions Supported

PHD 2.1, 3.0

PCC 2.3

HAWQ 1.2.1

Isilon 7.2

Of the applications listed in the table, GPCC is part of the GPDB installation package, and PCC and HAWQ are part of the PHD Application Suite installation package.

Note: When installing Pivotal application software, always check https://network.pivotal.io for the latest versions.

35 Supported software application versions

36 EMC DCA Getting Started Guide 3. Preparing the Data Center Environment

Confirming Site Requirements • Optional Securing Brackets • Cabinet Positioning • Package Dimensions and Clearance •

Confirming Site Requirements The section summarizes the site requirements for the DCA. Floor Space Requirements • DCA Rack Dimensions • Connecting New Racks to the Power Supply • Power Cord Specifications • Environmental Requirements • Air Quality Requirements •

Floor Space Requirements The following table describes the physical footprint of the DCA. A multiple-rack DCA is comprised of a System and an Aggregation rack, and possibly one or more Expansion racks: System rack (1st position) • Aggregation rack (2nd position) • Expansion rack(s) (if any; 3rd to 11th positions) •

Table 3.1 Physical Dimensions (approximate)

Weight Height Width Depth

900 lbs System rack (1/4 rack) 410 kg

1125 lbs System rack (1/2 rack) 510 kg

1600 lbs 75 in 24 in 42 in System rack (full rack) 726 kg 190 cm 61 cm 107 cm 1550 lbs Aggregation rack (full rack) 703 kg

1525 lbs Expansion rack (full rack) 692 kg

Confirming Site Requirements 37 Confirming Site Requirements

DCA Rack Dimensions A multiple-rack DCA is comprised of more than one rack each with various modules. These can be loosely organized into System, Aggregation, or Expansion racks. For example, a multi-rack cluster has at least one System rack, one Aggregation rack, and from zero to nine Expansion racks. Each rack type has a set group of required hardware installed. This means that while each rack has the same height, width, and depth, the base weight and power draw varies somewhat. The physical dimensions for all the rack types are: Height: 40 Rack Units, 190 cm (75 inches) • Width: 61 cm (24 inches) • Depth: 107 cm (42 inches) •

Determining DCA Power Requirements Based on Weight The weight and power draw of each rack will depend on the hardware installed. This includes the rack’s required hardware and any modules added to the rack. Table 3.2 and Table 3.4 list these items separately for standard memory and high memory configurations, respectively. To get a particular configuration’s specification, the parts must be added together. Table 3.3 and Table 3.5 list example power requirements for common DCA configurations with standard memory and high memory, respectively.

Power Connection Requirements when Plugging in a New Rack If your DCA cluster is comprised of more than one module (four servers) in a rack then four power cords are required. If there is a single module in a rack (four servers) then two power cords are required.

Note: High memory modules can be installed in racks with four power drops, but are limited to 30U, or 12 total high-memory servers. The total rack power for this configuration will not exceed the maximum full rack power numbers listed for standard memory with four modules installed.

38 EMC DCA Getting Started Guide Confirming Site Requirements

Standard memory, 64GB servers, four power drops Table 3.2 Standard Memory Weight and Power Requirements

Weight Max Power Draw BTU

Empty System Rack (2 master nodes, 3 switches) 295kg (~655lbs) 1290VA 4402

Empty Aggregation Rack (5 switches) 280kg (~620lbs) 750VA 2559

Empty Expansion Rack (3 switches) 260kg (~575lbs) 450VA 1535

GPDB Module (any disk size, standard memory) 102kg (~225lbs) 2080VA 7097

GPHD/Hadoop Master 109kg (~240lbs) 2004VA 6837

GPHD/Hadoop Worker 109kg (~240lbs) 2004VA 6837

HD Compute (Hadoop+Isilon) 40kg (~80lbs) 824VA 2811

DIA (2 x 1U server) 40kg (~80lbs) 824VA 2811

DIA (2 x 2U server, 12 disks, standard memory) 55kg (~125lbs) 1002VA 3418

Table 3.3 Standard Memory Weight and Power Requirements for Common Configurations

Weight Max Power Draw BTU

1/4 Rack GPDB (minimum config) 397kg (~880lbs) 3384VA 11546

1/2 Rack GPDB 499kg (~1105lbs) 5464VA 18643

3/4 Rack GPDB 601kg (~1330lbs) 7544VA 25740

Full Rack GPDB 703kg (~1555lbs) 9624VA 32837

1/2 Rack Hadoop (minimum config) 513kg (~1135lbs) 5312VA 18123

3/4 Rack Hadoop 622kg (~1375lbs) 7316VA 24960

Full Rack Hadoop 731kg (~1615lbs) 9320VA 31797

1/4 Rack GPDB + 3/4 Rack Hadoop 724kg (~1600lbs) 9396VA 32057

1/2 Rack HD Compute (minimum config) 455kg (~975lbs) 4600VA 15693

Full Rack HD Compute (10 HDC modules) 695kg (~1455lbs) 9544VA 32559

39 Confirming Site Requirements

High memory, 256GB, six power drops Table 3.4 High Memory Weight and Power Requirements

Weight Max Power Draw BTU

Empty System Rack (2 master nodes, 3 switches) 295kg (~655lbs) 1330VA 4538

Empty Aggregation Rack (5 switches) 280kg (~620lbs) 750VA 2559

Empty Expansion Rack (3 switches) 260kg (~575lbs) 450VA 1535

GPDB Module (any disk size, high memory) 102kg (~225lbs) 2416VA 8244

GPHD/Hadoop Master Module (4 x 1U) 80kg (~160lbs) 1680VA 5732

GPHD/Hadoop Master Module (4 x 2U, 12 disks) 109kg (~240lbs) 2112VA 7206

GPHD/Hadoop Worker 109kg (~240lbs) 2112VA 7206

HD Compute (Hadoop+Isilon) 40kg (~80lbs) 840VA 2866

DIA (2 x 1U server) 40kg (~80lbs) 840VA 2866

DIA (2 x 2U server, 24 disks, high memory) 52kg (~115lbs) 1208VA 4122

Table 3.5 High Memory Weight and Power Requirements for Common Configurations

Weight Max Power Draw BTU

1/4 Rack GPDB (minimum config) 397kg (~880lbs) 3746VA 12781

1/2 Rack GPDB 499kg (~1105lbs) 6162VA 21025

3/4 Rack GPDB 601kg (~1330lbs) 8578VA 29268

Full Rack GPDB 703kg (~1555lbs) 10994VA 37512

1/2 Rack Hadoop (minimum config) 513kg (~1135lbs) 5554VA 18950

3/4 Rack Hadoop 622kg (~1375lbs) 7666VA 26156

Full Rack Hadoop 731kg (~1615lbs) 9778VA 33363

1/4 Rack GPDB + 3/4 Rack Hadoop 724kg (~1600lbs) 10082VA 34400

1/2 Rack HD Compute (minimum config) 455kg (~975lbs) 5530VA 18868

Full Rack HD Compute (10 HDC modules) 695kg (~1455lbs) 9730VA 33199

40 EMC DCA Getting Started Guide Confirming Site Requirements

Connecting New Racks to the Power Supply When installing a new rack, the power source must be connected. When upgrading a rack with one module to two, three or four modules, the power distribution panel (PDP) to power distribution unit (PDU) connections may need to be re-routed. The customer power feeds connect to PDPs which feed PDUs. The switches and servers connect to PDUs.

Power Cord Specifications Table 3.6 Power Cord Specifications Power Cord Connector Country Power Cord Model Descriptions USA, Japan DCA2-US-15 DCA - Single Phase, 24Amp, 15ft L6-30P Black 4PPP DCA2-US-15 DCA - Single Phase, 24Amp, 15ft L6-30P Gray 4PPP

Australia DCA2-ASTL-15 DCA - Single Phase, 24Amp, 15ft 56PA332 Black 4PPP DCA2-ASTL-15 DCA - Single Phase, 24Amp, 15ft 56PA332 Gray 4PPP

Other DCA2-IEC3-15 DCA - Single Phase, 24Amp, 15 ft IEC309P Black 4PPP countries DCA2-IEC3-15 DCA - Single Phase, 24Amp, 15 ft IEC309P Grey 4PPP

Other power DCA2-RUS-15 DCA - Single Phase, 24Amp, 15ft 3750DP Black 4PPP cord types DCA2-RUS-21 DCA - Single Phase, 24Amp, 15ft 3750DP Grey 4PPP

Environmental Requirements Table 3.7 Environmental Requirements

+15°C to +32°C (59°F to 89.6°F) site temperature

40% to 55% relative humidity

0 to 2439 meters (0 to 8,002 feet) above sea level operating altitude

Recommendation: Do not exceed 6 consecutive months of unpowered storage.

41 Confirming Site Requirements

Air Quality Requirements EMC products are designed to be consistent with the requirements of the American Society of Heating, Refrigeration and Air Conditioning Engineers (ASHRAE) Environmental Standard Handbook and the most current revision of Thermal Guidelines for Data Processing Environments, Second Edition, ASHRAE 2009b. The data center should maintain a cleanliness level as identified in ISO 14664-1, class 8 for particulate dust and pollution control. The air entering the data center should be filtered with a MERV 11 filter or better. The air within the data center should be continuously filtered with a MERV 8 or better filtration system. In addition, efforts should be maintained to prevent conductive particles, such as zinc whiskers, from entering the facility. The allowable relative humidity level is 20 to 80% non condensing, however, the recommended operating environment range is 40 to 55%. For data centers with gaseous contamination, such as high sulfur content, lower temperatures and humidity are recommended to minimize the risk of hardware corrosion and degradation. In general, the humidity fluctuations within the data center should be minimized. It is also recommended that the data center be positively pressured and have air curtains on entry ways to prevent outside air contaminants and humidity from entering the facility. For facilities below 40% relative humidity, it is recommended to use grounding straps when contacting the equipment to avoid the risk of Electrostatic discharge (ESD), which can harm electronic equipment. As part of an ongoing monitoring process for the corrosiveness of the environment, it is recommended to place copper and silver coupons (per ISA 71.04-1985, Section 6.1 Reactivity), in airstreams representative of those in the data center. The monthly reactivity rate of the coupons should be less than 300 Angstroms. When monitored reactivity rate is exceeded, the coupon should be analyzed for material species and a corrective mitigation process put in place.

42 EMC DCA Getting Started Guide Optional Securing Brackets

This EMC® cabinet ventilates from front to back; you must provide adequate clearance to service and cool the system. Depending on component-specific connections within the cabinet, the available power cord length may be somewhat shorter than the 15-foot standard.

Figure 3.1 Access and Ventilation Requirements

Optional Securing Brackets If you intend to secure the optional stabilizer brackets to your site floor, prepare the location for the mounting bolts. The additional brackets help to prevent the cabinet from tipping while you service cantilevered levels, or from rolling during minor seismic events. The brackets provide three levels of protection for stabilizing the unit. Anti-Tip Bracket • Anti-Move Bracket • Seismic Restraint Bracket •

43 Optional Securing Brackets

Anti-Tip Bracket Use this bracket to provide an extra measure of anti-tip security. One or two kits may be used. For cabinets with components that slide, EMC recommends that you use two kits.

Figure 3.2 Anti-Tip Bracket Placement

Anti-Move Bracket Use this bracket to permanently fasten the unit to the floor.

Figure 3.3 Anti-Move Bracket Placement

44 EMC DCA Getting Started Guide Optional Securing Brackets

Seismic Restraint Bracket Use this bracket to provide the highest protection from moving or tipping.

Figure 3.4 Seismic Restraint Bracket Placement

45 Cabinet Positioning

Cabinet Positioning The cabinet bottom includes four caster wheels. The front wheels are fixed; the two rear casters swivel in a 1.75-inch diameter. Swivel position of the caster wheels will determine the load-bearing points on your site floor, but does not affect the cabinet footprint. Once you have positioned, leveled, and stabilized the cabinet, the four leveling feet determine the final load-bearing points on your site floor.

Figure 3.5 Cabinet Positioning

When the cabinet is centered over two typical 24 in. (60.96 cm) by 24 in. (60.96 cm) floor tiles: Cutouts should be 8 in. (20.32 cm) by 6 in. (15.24 cm). • Cutouts should be centered on the tiles, 9 in. (22.86 cm) from the front and rear and 8 • in. (20.32 cm) from sides.

46 EMC DCA Getting Started Guide Package Dimensions and Clearance

Package Dimensions and Clearance Make certain your doorways and elevators are wide enough and tall enough to accommodate the shipping pallet and cabinet. Use a mechanical lift or pallet jack to position the packaged cabinet in its final location.

Figure 3.6 Door Clearance

Leave approximately 2.43 meters (8 feet) of clearance at the back of the cabinet to unload the unit and roll it off the pallet.

Figure 3.7 Unloading Clearance

47 Package Dimensions and Clearance

48 EMC DCA Getting Started Guide 4. Planning for a Multiple Rack DCA

This chapter contains information required to plan for a multiple rack DCA. For best results, plan for cabling based on the size of the DCA. A 2-to-6 rack cluster requires different cabling than a 7-to-11 rack cluster.

Cable Kits There are several cable kits of differing cable lengths for multi-rack connectivity. Use the table below to determine and order the kit most suitable for the customer’s environment.

Component Part Component Kit Name Numbers Quantity Component Description 100-585-048 16 ARISTA 10GBASE-SRL SFP+ OPTIC MODULE 038-003-733 8 10m LC to LC Optical 50 Micron MM Cable Assemblies DCA2-CBL10 038-003-476 2 25’ CAT6 Ethernet Cable 100-585-048 16 ARISTA 10GBASE-SRL SFP+ OPTIC MODULE 038-003-740 8 30m LC to LC Optical 50 Micron MM Cable Assemblies DCA2-CBL30 038-003-475 2 100’ CAT6 Ethernet Cable

11 Rack Cable Kits Use the following table to plan for installation or expansion of 7-to-11 rack cluster.

Connect from: To: Use cable kit: Rack 1 - SYSRACK DCA2-CBL10 Rack 2 - AGGREG DCA2-CBL10 Rack 3 - 1st EXPAND DCA2-CBL10 Rack 4 - 2nd EXPAND DCA2-CBL10 Rack 5 - 3rd EXPAND DCA2-CBL10

Rack 2 - AGGREG Rack 6 - 4th EXPAND DCA2-CBL10 Rack 7 - 5th EXPAND DCA2-CBL30 Rack 8 - 6th EXPAND DCA2-CBL30 Rack 9 - 7th EXPAND DCA2-CBL30 Rack 10 - 8th EXPAND DCA2-CBL30 Rack 11 - 9th EXPAND DCA2-CBL30

49 50 EMC DCA Getting Started Guide 5. Gathering Site-Specific Information

To complete the DCA installation, gather the following information from the customer’s network and database personnel. The following sections are included: • Site Requirements Checklist •VLAN Overlay • Planning for Remote Support - ESRS and Dialhome

Site Requirements Checklist

Table 5.1 Site-Specific Information

Information Description

External IP and hostname This is the IP address and hostname that the customer will use to of the Primary Master connect to the Primary Master server from their public LAN. The Master hostname is also used for client connections to Greenplum Database.

External IP and hostname This is the IP address and hostname that the customer will use to of the Standby Master connect to the Standby Master server from their public LAN.

External IP and hostname This is the IP address and hostname that the customer will use to of the Hawk Master Node 2 connect to the Hawk Master server from their public LAN.

Virtual IP Address A Virtual IP Address (VIP) is required in order to use Master server failover features. A VIP will simply be a third external IP address. If the Primary Master server fails the VIP is transferred from the Primary to the Standby Master server. The VIP is the IP address to which client tools should connect. If the subnet and gateway for the VIP differ from the other external IP addresses, this should be collected also.

External IP and hostname An IP address and hostname is required for each DIA server. The for DIA servers source data will be transferred to the DIA through this connection. Four IP addresses and hostnames are required per DIA module.

Netmask Netmask of the customer’s network.

Gateway Default gateway of the customer’s network and the IP address and interface name of the router.

NTP server IP The IP address or hostname of the customer’s preferred NTP (Network Time Protocol) server.

DNS name server IP The IP address of the customer’s DNS name server.

BMC password This is the password used for remote access to the Primary Master, Standby Master and Segment servers using Intel’s Baseboard Management Controller (BMC) interface. The default BMC password is sephiroth.

root password Customer-supplied root password for the Primary and Standby Master servers and Segment servers. The default root password is changeme.

gpadmin password Customer-supplied Greenplum Database superuser password. The default gpadmin password is changeme.

Site Requirements Checklist 51 Site Requirements Checklist

Table 5.1 Site-Specific Information

Information Description

System locale The preferred locale to be used on the Primary Master, Standby Master and Segment servers. en_US.UTF-8 is the default locale for the DCA (U.S. English and Unicode character set encoding). A locale identifier consists of a language identifier and a region identifier, and optionally a character set encoding. For example, sv_SE is Swedish as spoken in Sweden, en_US is U.S. English, and fr_CA is French Canadian. If more than one character set encoding can be useful for a locale, then the specifications look like this: en_US.UTF-8 (locale specification and character set encoding). System timezone The local timezone to be used on the Master, Standby Master and Segment servers. The default timezone is PST.

Database character set UNICODE (UTF-8) is the default character set encoding for encoding Greenplum Database (server-side encoding). This is usually the best choice, as it allows the customer to store all possible Unicode characters from any language. But if all data you are storing is from a single language (now and in the future), there is a slight storage space penalty compared to encoding specific to that language. If space savings is important to the customer, the customer should consider Latin-1, Latin-9, or WIN1252 for US or Western European installations, since those encodings use a single per character. Likewise in Thailand you might consider using WIN874 to store Thai, because it uses a single byte per character. However, keep in mind that doing so prevents the customer from storing any data outside those character sets. Even in the US or Western Europe, customers might find that some of their data is Latin-1, while some is Latin-9 or Win1252, so any choice of single-byte encoding will not accommodate all of their data needs. See the Greenplum Database Administration Guide for a list of all supported character set encodings.

Software Tools Connection to the DCA for setup and management requires an SSH utility. EMC recommends Putty or Cygwin.

52 EMC DCA Getting Started Guide Plan for Hadoop Networking

Table 5.1 Site-Specific Information

Information Description

Hardware Tools The following hardware tools will be required during installation of the DCA: • Utility Knife • 9/16’’ Socket Wrench • ESD (electro-static discharge) kit

Power Connection for Power for external devices should not be drawn from the DCA Service Laptop cabinet. A power connection is required for the EMC personnel service laptop. The connection should be a standard AC 100-240V~1.5A, 50-60hz outlet.

Dial-home and ESRS The DCA supports dial-home for event notification to EMC Global Connectivity Services support center. Communication from the DCA to EMC is done via FTPS. Firewall access should be setup to allow FTPS traffic from the DCA’s external IP address to the following EMC addresses: corpusfep3.emc.com corpusfep4.emc.com The DCA can also be supported remotely through an ESRS Gateway. If the DCA is to be setup in an environment with ESRS, the Gateway IP address should be identified prior to installation. The DCA supports FTP, SMTP and HTTPS connection types to the ESRS Gateway. Default port numbers: • ConnectEMC — 989; 990 • Passive FTP port range — 20000-30000

Plan for Hadoop Networking Hadoop modules have specific networking requirements. It is important to plan these requirements with the customer prior to an installation. Hadoop services cannot be started without the proper networking configuration.

Table 5.2 Default Hadoop Ports

Port number Application

50070 Name Node/HDFS Web Interface

50030 Job Tracker

50060 Task Tracker

8020 HDFS Default Port

60010 HBase

2181 Zookeeper

8021 MapReduce port

VLAN Overlay The VLAN overlay is the most commonly used method to provide external access into or out of the DCA’s non-master node. This configuration method imposes an additional VLAN on the interconnect network interfaces for some or all of a DCA’s servers that offer a logical route to some or all of a customer’s external systems.

53 Planning for Remote Support - ESRS and Dialhome

The goal of this topology is to avoid exposing the internal DCA networks to external systems in a way that might introduce IP address conflicts for sites with multiple DCAs. This also allows multiple DCAs to interact with the same external resources such as backup destinations and data sources. Figure 5.1 illustrates one possible topology over the overlay. The drawing shows the logical relationship between a full rack DCA and four servers on the customer network. This drawing demonstrates one of several possible configurations for the overlay. In this case, a dedicated VLAN (1000) is configured and some of the DCA servers are included in both the new VLAN and the internal VLAN (4). In this case, etl1 through etl4 have two IP addresses on their interconnect interfaces: one on VLAN 4 and another on VLAN 1000 in a different subnet. The customer systems shown would also have an IP address that is part of the subnet using VLAN 1000 but nothing on VLAN 4. This allows these systems to interact with the DCA servers in the overlay without exposing the internal VLAN in the DCA. This topology is only one of many possibilities. All the DCA servers can be included in the overlay; the overlay can be an extension of the customer’s network; or the overlay can include only the servers that need to communicate in or out of the DCA.

Figure 5.1 VLAN Overlay Configuration

Planning for Remote Support - ESRS and Dialhome The DCA supports remote support and dialhome through EMC Secure Remote Support (ESRS) as well as secure direct dialhome. Use the following information to plan remote support for the DCA.

54 EMC DCA Getting Started Guide Planning for Remote Support - ESRS and Dialhome

ESRS Considerations Review the following considerations for implementing remote support on DCA through an ESRS gateway: • The ESRS Gateway must be running a minimum version of 2.08 • Port 22 between the ESRS Gateway and DCA must be open to allow for INCOMING support - from the EMC Support Center. • SMTP is supported for OUTGOING (dialhome) support. Port 25 must be open between the ESRS Gateway and DCA for SMTP support. • FTP is supported for OUTGOING (dialhome) support. Port 21 must be open between the ESRS Gateway and DCA for FTP support. • HTTPS is supported for OUTGOING (dialhome) support. Port 443 must be open between the ESRS Gateway and DCA for HTTPS support.

Secure Direct Dialhome Considerations Review the following considerations for implementing dialhome on the DCA directly to EMC - using the FTPS protocol: • The DCA must have access to corpusfep3.emc.com and corpusfep4.emc.com using the passive FTPS protocol. The passive FTPS protocol uses ports 989 and 990 to establish a connection and a dynamic port range of 20000-30000 to transfer data. • OUTGOING (dialhome) support only; INCOMING is not supported.

55 Planning for Remote Support - ESRS and Dialhome

56 EMC DCA Getting Started Guide 6. DCA Administration

This chapter describes the database maintenance tasks and the tools available to diagnose, monitor, and troubleshoot a GPDB system running on the DCA. This chapter includes the following sections: DCA utilities • Database and System Monitoring Tools • SNMP on the DCA • DCA MIB information • Integrate DCA MIB with environment • General Database Maintenance Tasks • Next Steps •

DCA utilities The DCA provides the following utilities: dca_setup • dcacheck • dcaperfcheck • dca_shutdown • gppkg •

To run these utilities, you must connect to the Primary or Standby Master server. For more information on GPDB-specific tools, see the DCAv2 Data Computing Appliance Administration Guide. dca_setup The dca_setup utility is an administration tool used to install, upgrade, and modify settings on a Data Computing Appliance (DCA). EMC recommends using the dca_setup utility versus modifying the Linux configuration files directly for the following reasons:

Changes made through the dca_setup utility automatically take care of • dependencies that may exist. For example, if a hostname is changed by some other method than the dca_setup utility then there is a possibility that not all files will get updated appropriately with the new hostname. These naming inconsistencies can lead to problems during configuration and upgrade processes. Operations through the utility are recorded for audit purposes. • dca_setup The utility is the EMC recommended and supported administration tool. • dca_setup Not using it could invalidate the customer’s support warranty.

For usage information and a description of the operations available through dca_setup, see the utility reference in the DCAv2 Data Computing Appliance Administration Guide. dcacheck Validate hardware and operating system settings.

DCA utilities 57 DCA utilities

Synopsis dcacheck { -f hostfile | -h hostname } { --stdout | --zipout } [ --config config_file ] dcacheck --zipin dcacheck_zipfile dcacheck -?

Description The dcacheck utility validates DCA operating system and hardware configuration settings. The dcacheck utility can use a host file or a file previously created with the --zipout option to validate settings. At the end of a successful validation process, DCACHECK_NORMAL message displays. If DCACHECK_ERROR displays, one or more validation checks failed. You can use also dcacheck to gather and view platform settings on hosts without running validation checks.

EMC recommends that you run dcacheck as the user root. If you do not run dcacheck as root, the utility displays a warning message and will not be able to validate all configuration settings. Only settings will be validated. Running dcacheck with no parameters validates settings in the following file:

/opt/dca/etc/dcacheck/dcacheck_config The specific configuration parameters that are validated depends on the DCA software release.

Options

--config config_file The name of a configuration file to use instead of the default file /opt/dca/etc/dcacheck/dcacheck_config.

-f hostfile The name of a file that contains a list of hosts dcacheck uses to validate settings. This file should contain a single host name for all hosts in the DCA.

-h hostname The name of a host that dcacheck will use to validate platform-specific settings.

--stdout Display collected host information from dcacheck. No checks or validations are performed.

--zipout Save all collected data to a .zip file in the current working directory. dcacheck automatically creates the .zip file and names it dcacheck_timestamp.tar.gz. No checks or validations are performed.

--zipin file Use this option to decompress and check a .zip file created with the --zipout option. dcacheck performs validation tasks on the file you specify in this option.

-? Print the online help.

58 EMC DCA Getting Started Guide DCA utilities

Examples Verify and validate the DCA settings on specific servers:

# dcacheck -f /home/gpadmin/gpconfigs/hostfile

Verify custom settings on all DCA servers:

# dcacheck --config my_config_file dcaperfcheck Run performance tests on disk, memory and network.

Synopsis dcaperfcheck { -f hostfile | -h hostname } { -r [d | s | n | N | M ] } [-B size ] [ -S size ] -d test_dir {-v | -V } [ -D ] [ --duration seconds ] [ --netperf ] dcaperfcheck -?

Description The dcaperfcheck utility is used to test the performance of DCA hardware. It validates whether the network, disk, and memory components perform as expected. The test is useful for detecting hardware failures or mis-cabling. Users can run the dcaperfcheck utility as gpadmin or root. To run the utility as the user gpadmin requires permissions to read and write from the test directory.

Options

-d test_directory Directory where data will be written to and read from. You can specify multiple -d flags for multiple directories on each server. During network and memory tests, this can be the /tmp directory. During disk tests, use operating system mount points that will exercise each drive.

-v Enable verbose output.

-V Enable very verbose output.

-D Print statistics for each host. The default output will print only the hosts with lowest and highest values.

-rd, -rs, -rn, -rN, -rM Specify type of test to run, d - disk, s - stream (memory), n - serial netperf, N - parallel netperf, or M - full matrix netperf. You can combine options. For example, -rds. The default is dsn. Typically, disk and network tests are performed separately because disk tests require more test directories to be specified, where network tests only require a single, temporary directory.

-B size Specify the block size for disk performance tests. The default is 32KB. Examples: 1KB, 4MB.

59 DCA utilities

-S size Specify the for disk performance tests. The default is 2x server memory. On a DCA, there is 64GB of memory, so the default is 128GB. Examples: 500MB, 16GB.

-h hostname Specify a host to run the utility. You can specify multiple hosts.

-f hostfile Specify a file that lists the hosts on which you want to run the utility. The hostfile you specify depends on the type of test (disk or network) that you intend to run.

--duration seconds Specify, in seconds, how long you want to run the network test.

--netperf Use the netperf network test instead of gpnetbenchServer/gpnetbenchClient. You can only run this option if you specify the network test.

-? Print online help.

Examples Run a parallel network and stream test on Interconnect 1:

# dcaperfcheck -f /home/gpadmin/gpconfigs/hostfile_gpdb_ic1 -rsN -d /tmp Run a disk test, using all the data directories on a Segment server, sdw1:

# dcaperfcheck -h sdw1 -rd -d /data1 -d /data2 dca_shutdown The DCA shutdown utility will safely power off all servers in a DCA.

Synopsis dca_shutdown { -f hostfile | -h hostname } [ --ignoredb ] [ --password= password ] [ --passfile= password_file ] [--statusonly] dca_shutdown dca_shutdown --help

Description The dca_shutdown utility will safely power down all servers in a DCA. The utility can be run with no parameters, and will use the system inventory generated by DCA Setup during an installation or Regenerate DCA Config Files operation. If the utility is run with a hostfile or hostname specified, only those hosts will be shut down. This utility will not shut down the administration, Interconnect or aggregation switches.

The utility should be run as the user root. Prior to running the dca_shutdown utility, the following steps should be performed to ensure a clean shut down:

1. Stop health monitoring as the user root: $ su - # dca_healthmon_ctl -d

60 EMC DCA Getting Started Guide DCA utilities

2. Stop Greenplum Database: $ gpstop -af

3. Disable Hawq (if applicable): $ ssh hdm2 $ /etc/init.d/hawq stop $ exit

4. Disable Hadoop (if applicable): $ icm_client list $ icm_client stop -l

5. Disable Pivotal Command Center (if applicable): #/etc/init.d/commander stop

6. Stop Command Center: $ gpcmdr --stop

Options -?, --help Print usage and help information

-i, --ignoredb Do not check if Greenplum Database, health monitoring, or Command Center are running. Shut down all servers immediately.

-h, --host hostname Perform a shutdown on the host specified.

-f, --hostfile hostfile Perform a shutdown on the hosts listed in the hostfile. This option can not be used with the --host option.

-p, --password password Specify a password to connect to the server’s IPMI (iDRAC) to perform the shutdown. The password is originally set during installation with DCA Setup - if an installation through DCA Setup has never been run, the user will be prompted for a password.

-s, --passfile password_file Specify a file containing the password to use to connect to the server’s IPMI (iDRAC) to perform the shutdown. This file is generated during installation with DCA Setup, and is located in /opt/dca/etc/ipmipasswd.

-o, --statusonly Print the power status (ON | OFF) of all servers. This will not power off any servers.

Examples Shut down all servers in a DCA:

dca_shutdown Shut down servers listed in the file hostfile: dca_shutdown -f /home/gpadmin/gpconfigs/hostfile

61 DCA utilities gppkg Installs GPDB extensions such as pgcrypto, PL/R, PL/Java, PL/Perl, PostGIS, and MADlib, along with their dependencies, across an entire cluster.

Synopsis gppkg [-i package | -u package | -r name-version | -c] [-d master_data_directory] [-a] [-v] gppkg --migrate GPHOME_1 GPHOME_2 [-a] [-v] gppkg [-q | --query] query_option gppkg -? | --help | -h gppkg --version

Description The Greenplum Package Manager (gppkg) utility installs GPDB extensions, along with any dependencies, on all hosts across a cluster. It will also automatically install extensions on new hosts in the case of system expansion and segment recovery. First, download one or more of the available packages from Pivotal Network, then copy it to the master host. Use the Greenplum Package Manager to install each package using the options described below.

Note: After a major upgrade to GPDB, you must download and install all extensions again.

Examples of database extensions and packages software that are delivered using the Greenplum Package Manager are: PostGIS • PL/Java • PL/R • PL/Perl • MADlib • Pgcrypto • Note that Greenplum Package Manager installation files for extension packages may release outside of standard GPDB release cycles. For information about supported package extensions, see the Pivotal GPDB Release Notes for your release.

Options -a (do not prompt) Do not prompt the user for confirmation.

-c | --clean Reconciles the package state of the cluster to match the state of the master host. Running this option after a failed or partial install/uninstall ensures that the package installation state is consistent across the cluster.

-d master_data_directory The master data directory. If not specified, the value set for $MASTER_DATA_DIRECTORY will be used.

62 EMC DCA Getting Started Guide DCA utilities

-i package | --install=package Installs the given package. This includes any pre/post installation steps and installation of any dependencies.

--migrate GPHOME_1 GPHOME_2 Migrates packages from a separate $GPHOME. Carries over packages from one version of GPDB to another. For example: gppkg --migrate /usr/local/greenplum-db-4.2.0.1 /usr/local/greenplum-db-4.2.1.0 This option is automatically invoked by the installer during minor upgrades. This option is given here for cases when the user wants to migrate packages manually. Migration can only proceed if gppkg is executed from the installation directory to which packages are being migrated. That is, GPHOME_2 must match the $GPHOME from which the currently executing gppkg is being run.

-q | --query query_option Provides information specified by query_option about the installed packages. Only one query_option can be specified at a time. The following table lists the possible values for query_option. is the name of a package. Table 6.1 Query Options for gppkg

query_option Returns

Whether the specified package is installed.

--info The name, version, and other information about the specified package.

--list The file contents of the specified package.

--all List of all installed packages.

-r name-version | --remove=name-version Removes the specified package.

-u package | --update=package Updates the given package.

 The process of updating a package includes removing all previous versions of the system objects related to the package. For example, previous versions of shared libraries are removed.

After the update process, a database function will fail when it is called if the function references a package file that has been removed.

--version (show utility version) Displays the version of this utility.

63 DCA utilities

-v | --verbose Sets the logging level to verbose.

-? | -h | --help Displays the online help.

Database and System Monitoring Tools Data Computing Appliance provides various tools to monitor the status of GPDB as well as the hardware components it runs on. This section contains the following topics: ConnectEMC Dial Home Capability • Web-Based Management Options • GPDB Email and SNMP Alerting •

ConnectEMC Dial Home Capability The Data Computing Appliance and Data Integration Accelerator support dial home functionality through the ConnectEMC software. ConnectEMC is a support utility that collects and sends event data (files indicating system errors and other information) from EMC products to EMC Global Services customer support. ConnectEMC sends DCA event files using the secure file transfer protocol (FTPS). If an EMC Secure Remote Support Gateway (ESRS) is used for connectivity, HTTPS or FTP are available protocols for sending alerts. For default DCA port numbers, see Table 5.1, “Site-Specific Information” . The ConnectEMC software is configured on the DCA master and standby master servers and sent out through the external connection (eth1) either to an ESRS Gateway server or directly to EMC.

Dial Home Severity Levels Alerts that arrive at EMC Global Services can have one of the following severity levels:

Severity 0 — UNKNOWN: This severity level is associated with hosts and devices on • the DCA that are either disabled due to hardware failure or unreachable for some reason. This severity creates a service request. Severity 1 — ERROR: This indicates that an error occurred on the DCA. System • operation, performance, or both are likely affected. This severity creates a service request. : This indicates a condition that might require immediate • Severity 2 — WARNING attention. This severity creates a service request. : This severity level indicates that a previously reported error • Severity 3 — INFO condition is now resolved. An event with this severity level is also used to provide information about the system that does not require any action. This severity does not create a service request. For example, GPDB startup triggers an INFO alert. The severity of events determines if a service request is created for EMC support to act on. The events listed in , “The following table lists the conditions that cause ConnectEMC to send event data alerts to EMC Global Services.” can generate multiple severity levels based on the error condition.

For example, if a segment server disk drive fails, Symptom Code 13 is generated with a severity of ERROR. The ConnectEMC software dials home to Global Services customer support, and a service request is created. On successful replacement of the disk drive, Symptom Code 13.11001 is generated again, this time with a severity of INFO to note that the disk drive was replaced.

64 EMC DCA Getting Started Guide DCA utilities

ConnectEMC Event Alerts The following table lists the conditions that cause ConnectEMC to send event data alerts to EMC Global Services.

Table 6.2 DCA Error Codes

Code Description

1.1 Host not responding to SNMP calls, host may be down.

1.4 Interface status: could not open session to host 1.

2.15 Greenplum Database is ready to accept connections.

2.15000 GPDB status– Sent from inside GPDB such as panics, GPDB start.

2.15001 GPDB status– could not access the status of a transaction.

2.15002 GPDB status– interrupted in recovery.

2.15003 2 Phase file corrupted

2.15005 Greenplum Database panic, insufficient resource queues available.

3.2000 Status of power supply, if PS fails, will get error with this code.

3.2004 Server power supply monitoring (using IPMI).

5.4000 Server status.

5.4001 Status of cooling device, e.g., fan failure.

5.4002 Temperature of system.

6.5001 Status check of a CPU. CPU failure will register here.

9.7000 Memory device status. Failed memory devices will get this code.

10.8003 Status of the network device.

10.8005 A configured network bond is unavailable.

10.8006 Network bonding on master servers: The bond interface has no active link/slave.

10.8007 Network bonding on master servers: The bond interface link/slave has changed.

10.8008 Network bonding on master servers: The bond interface links are all down.

10.8009 Network bonding on master servers: One of the bond interface links is down.

11.9001 Status of IO Controller

12.10002 Virtual Disk Status: one of the configured drives has failed or is offline

12.10004 Virtual disk size (MB)

12.10005 Write cache policy on virtual disk. For example, expected to be write back mode.

12.10007 Detects offline, rebuilding raid, and other unexpected virtual disk states.

12.10011 The percentage of disk space used on virtual disk has exceeded the error threshold, Example: mdw: 12.10011 : Disk space used 2%: error: one or more disk usage exceed error threshold, System is configured to operate under 1% disk capacity.

12.10014 Virtual disk space used (KB).

65 DCA utilities

Table 6.2 DCA Error Codes

Code Description

13.11001 Physical disk needs to be replaced. Slot number and capacity of the disk are indicated. Example: mdw: 13.11001: Physical Disk slot 6 Status: warning: unconfigured-good: Dev Id 6 : Adp Id 0 : Size 279 GiB

14.12002 Interconnect Switch Operational Status.

14.12008 Switch thermal status (V2).

14.12009 Displays Switch power supply status 1

14.12010 Switch power supply status 2

14.12011 mLAG status with peer switch

14.12012 mLAG status of port 1

14.12013 mLAG status of port 2

14.12014 mLAG status of port 3

14.12015 mLAG status of port 4

14.12016 mLAG status of port 5

14.12017 mLAG status of port 6

14.12018 mLAG status of port 7

14.12019 mLAG status of port 8

14.12020 mLAG status of port 9

14.12021 mLAG status of port 10

14.12022 mLAG status of port 11

14.12023 mLAG status of port 12

14.12024 mLAG status of port 13

14.12025 mLAG status of port 14

14.12026 mLAG status of port 15

14.12027 mLAG status of port 16

14.12028 mLAG status of port 17(mdw)

14.12029 mLAG status of port 18(smdw)

14.12030 mLAG status of port 19

14.12031 mLAG status of port 20

14.12032 mLAG status of port 21

14.12033 mLAG status of port 22

14.12034 LAG status

14.13007 Status errors from switch sensors – Fans (V2).

14.14000 Interface 0 Description: unexpected snmp value: val_len<=0.

66 EMC DCA Getting Started Guide DCA utilities

Table 6.2 DCA Error Codes

Code Description

14.14001 Interface 0 Status: unexpected status from device.

15.20000 An error is detected in the SNMP configuration of the host. Indicates an issue with the IP address setting in the SNMP configuration.

15.30000 Other SNMP related errors.

15.40000 Connection aborted by SNMP.

15.50000 Unexpected SNMP errors from the SNMP system libraries.

15.60000 Can not find expected OID during SNMP walk.

16.00000 Test Dial Home.

18.15000 Sent from inside GPDB when starting up.

18.15001 Sent from inside GPDB when GPDB could not access the status of a transaction.

18.15002 Sent from inside GPDB when interrupted in recovery.

18.15003 Sent from inside GPDB when a 2 phase file is corrupted.

18.15004 A test message sent from inside GPDB.

18.15005 Sent from inside GPDB when hitting a panic.

18.17000 Sent by healthmond when GPDB status is normal.

18.17001 Sent by healthmond when GPDB can not be connected to and was not shut down cleanly, possible GPDB failure.

18.17002 Sent by healthmond when detecting a failed segment.

18.17006 Sent by healthmond when detecting a move of the master segment from mdw to smdw.

18.17007 Sent by healthmond when detecting a move of the master segment from smdw to mdw.

18.17008 Sent by healthmond when a query fails during health checking.

18.17009 Healthmond error querying GPDB State.

18.17010 Database starts (informational only).

18.17011 Database stops (informational only).

19.18000 ID for informational dial homes with general system usage information.

21.20000 Core files were found on the system.

21.20001 Linux kernel core dump files were found on the system - indicates a crash and reboot.

21.20002 GPDB (PostgresSQL) core dump files were found on the system - indicates a crash and reboot.

22.21000 Master Node Failover was successful.

22.21001 GPAactive standby command failed during master node failover.

22.21002 Greenplum Database is not reachable after the failover.

22.21003 Error in bringing the remote (other) master server down during master node failover.

67 DCA utilities

Table 6.2 DCA Error Codes

Code Description

22.21004 Error in taking over the remote (other) master server IP.

22.21005 Unknown error in failover.

23.22002 Host did not complete upgrade within the specified timeout period. Timeout period is 12 hours by default unless set in /opt/dca/etc/healthmond/healthmond.cnf.

Web-Based Management Options For DCA, both Pivotal Greenplum Command Center and Pivotal Command Center are installed. Pivotal Command Center manages Pivotal Hadoop only. Pivotal Greenplum Command Center is required, even if no GPDB elements are part of the cluster, to provide health monitoring and dial home support for the cluster hardware. Pivotal Greenplum Command Center • Pivotal Command Center •

Pivotal Greenplum Command Center Pivotal Greenplum Command Center allows administrators to collect query and system performance metrics from a running GPDB system. Monitor data is stored within GPDB. Pivotal Greenplum Command Center is comprised of data collection agents that run on the master host and each segment host. The agents collect performance data about active queries and system utilization and send it to the master at regular intervals. The data is stored in a dedicated database on the master, where it can be accessed using the Greenplum Command Center web application or SQL queries. Pivotal Greenplum Command Center is a browser-based application that administrators can use to view active and historical query and system metrics stored in the gpperfmon database. By default, Pivotal Greenplum Command Center is installed on the GPDB master host using HTTP or HTTPS port 28080. It can be accessed through a browser using a URL, such as masterhostname.companydomain.com:28080. Before you can log into Pivotal Greenplum Command Center, the GPDB administrator must assign you a username and password. For instructions on granting access, see the Pivotal Greenplum Performance Monitor Administrator Guide.

Pivotal Command Center Pivotal Command Center allows an administrative user to administer and monitor one or more Pivotal HD clusters. The Command Center has command-line tools to deploy and configure Pivotal HD clusters, as well as an intuitive graphical user interface (GUI) that is designed to help the user view the status of the clusters and take appropriate action. This release of Command Center allows only administering and monitoring of Pivotal HD Enterprise 1.0.x clusters. Pivotal Command Center 2.0.x is comprised of the following: Pivotal Command Center UI • Pivotal HD Manager • Performance Monitor (nmon) •

68 EMC DCA Getting Started Guide SNMP on the DCA

PCC User Interface The PCC UI provides the user with a single web-based GUI to monitor and manage one or more Pivotal HD clusters. This web application is hosted on a Ruby-on-Rails application which presents the status and metrics of the clusters. This data comes from multiple sources. All of the Hadoop specific data comes from the Pivotal HD Manager component. The system metrics data is gathered by our Performance Monitor (nmon) component. The UI can be accessed through a browser using a URL, such as http://masterhostname.companydomain.com:5000/. For more detail, and instructions about Pivotal Command Center, see the Pivotal Command Center 2.x Installation Guide.

GPDB Email and SNMP Alerting The GPDB system can be configured to trigger SNMP alerts or send email notifications to system administrators when certain database events occur. These events can include fatal server errors, segment shutdown and recovery, and database system shutdown and restart. For instructions on enabling system alerts and email notifications, see the DCAv2 Data Computing Appliance Administration Guide.

Note: The Greenplum Database (GPDB) does not support SNMPv3 for this release.

SNMP on the DCA The DCA includes SNMP version 3 (SNMPv3) for authentication and encryption of health monitoring alerts. For more information on using SNMPv3 with DCA, see the DCAv2 Data Computing Appliance Implementation Guide. The DCA has an SNMP version 2 management information base (MIB). The MIB can be used by enterprise monitoring systems to identify issues with components and services in the DCA. The section includes the following major topics: DCA MIB information • Integrate DCA MIB with environment •

DCA MIB information This section describes the following: MIB Locations • MIB Contents • View MIB •

MIB Locations The DCA MIBs are located in the following locations:

/usr/share/snmp/mibs/GP-DCA-TRAP-MIB.txt /usr/share/snmp/mibs/GP-DCA-DATA-MIB.txt

69 DCA MIB information

MIB Contents The DCA public MIB is organized in the following way:

1.3.6.1.4.1.1139.23.1.1.X.y X.y Trap MIB components Enterprise OID DCA MIB OID 1.1 Trap Notifications 1.2 Symptom Code 1 = Trap MIB 1.3 Detailed Symptom Code 2 = Data MIB 1.4 Description 1.5 Severity Component OID 1.6 Hostname

X.y Data MIB components 2.1 DCA v1 Hardware 2.2 DCA UAP Edition Hardware 2.3 Services 2.4 Software Version 2.5 Hadoop Version 2.6 Basic System Information

Figure 6.1 MIB OID Structure

Table 6.3 DCA Data MIB - v2 Hardware and DCA Services Components

Data MIB Contents Component OID Description

1 - gpDCAv1Hardware gpMasterNodes 1 GPDB Primary and Standby Master servers.

gpSegmentNodes 2 GPDB Segment servers.

gpAdminSwitches 3 DCA administration switches.

gpInterconnectSwitches 4 DCA Interconnect switches.

gpEtlNodes 5 DIA servers.

gpHadoopMasterNodes 6 Hadoop Master servers.

gpHadoopWorkerNodes 7 Hadoop Worker servers.

gpAggAdminSwitches 8 DCA aggregation administration switches.

gpAggInterconnectSwitches 9 DCA aggregation Interconnect switches.

gpHbaseComputeNodes 10 Hadoop Compute servers.

70 EMC DCA Getting Started Guide DCA MIB information

Table 6.3 DCA Data MIB - v2 Hardware and DCA Services Components

Data MIB Contents Component OID Description

2 - gpDCAv2Hardware gpMasterNodes 1 GPDB Master servers.

gpSegmentNodes 2 GPDB Segment servers.

gpAdminSwitches 3 DCA administration switches.

gpInterconnectSwitches 4 DCA Interconnect switches.

gpEtlNodes 5 DIA servers.

gpHadoopMasterNodes 6 Hadoop Master servers.

gpHadoopWorkerNodes 7 Hadoop Worker servers.

gpAggAdminSwitches 8 DCA aggregation administration switches.

gpAggInterconnectSwitches 9 DCA aggregation Interconnect switches.

gpHadoopComputeNodes 10 Hadoop Compute servers.

3 - gpDCAServices gpDbService 1 Greenplum Database processes.

gpHadoopService 2 Hadoop processes.

Table 6.4 DCA Trap MIB

Trap MIB Contents Description

1 - gpDCATrap This OID is used for notifications generated for a hardware or database event.

2 - gpDCATrapSymCode Symptom code for the event.

3 - gpDCATrapDetailedSymCode Detailed symptom code for the event.

4 - gpDCATrapDesc Description of the event.

5 - gpDCATrapSeverity Severity of the event: 0 - unknown 1 - error 2 - warning 3 - info 4 - debug

6 - gpDCATrapHostname Server where the event occurred.

An example healthmon dialhome message looks like this:

(955): snmp_vals=['11','9002','Controller Battery 1 Status: ok','3','smdw : smdw']; Event Code 11.9002, Severity: Informational (3) - Message about smdw (standby master)

71 DCA MIB information

The table below shows how each element of the message corresponds to the rows in Table 6.4, “DCA Trap MIB” above.

Trap Symptom Detailed Symptom Hostname Notification Code Code Description Severity Internal: Custom

snmp_vals= 11 9002 Controller Battery 3 smdw:smdw 1 Status: ok

View MIB Issue the following commands from a Master server as the user root: # MIBS+=GP-DCA-DATA-MIB # export MIBS # snmpwalk -v 2c -c public 172.28.4.250 1.3.6.1.4.1.1139.23.1.1.2

Table 6.5 below shows examples of actual trap descriptions and trap severities. The list is not comprehensive. Table 6.5 Example trap descriptions and severities

1.1 Host not responding to SNMP calls, host may be down

Unknown

Power Supply Name: timeout

Upgrade State: timeout

Operational Status: timeout

Interface Description: timeout

Array Disk Name: timeout

Network Device Name: timeout

Virtual Disk Device Name: timeout

sysDescr: timeout

Virtual Disk Read Policy: timeout

Virtual Disk State: timeout

Controller Name: timeout

Network Device Ip Address: timeout

Virtual Disk Write Policy: timeout

Sensor Name: timeout

Network Device Status: timeout

Battery Status: timeout

Disk Space Used Percentage on Segment (/) Value: timeout

Power Supply Status: timeout

72 EMC DCA Getting Started Guide DCA MIB information

Table 6.5 Example trap descriptions and severities (continued)

Memory Device Status: timeout

Cache Device Status: timeout

Controller Battery State: timeout

Interface 404161031 Description: timeout

Cooling Device High critical temp: timeout

1.4

Unknown

Interface Status: could not open session to host

2.15 GPDB status - Sent from inside GPDB such as panics, GPDB start

Info database system is ready to accept connections PostgreSQL 8.2.15 (Greenplum Database 4.3.5.2 build 4) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 5.2 compiled on Aug 2 2015 10:46:48

2.15005

Error

PANIC: insufficient resource queues available

PANIC: proclock table corrupted (lock.c:1247)

PANIC: could not write to file "pg_xlog/xlogtemp.12691": No space left on device

PANIC: Unable to complete 'Abort Prepared' broadcast for gid = 1323052957-0000019676 (cdbtm.c:930)

PANIC: Waiting on lock already held! (lwlock.c:552)

PANIC: could not open file "global/pg_control": No such file or directory

PANIC: out of shared memory

3.2 Status of power supply, if PS fails, will get error with this code

Error

PS 1 Status: critical

PS 2 Status: critical

Info

PS 2 Status: ok

PS 1 Status: ok

Warning

PS 2 Status: nonCritical

5.4002 Temperature of system2

Warning

System Temperature: Temperature not in normal range

9.7 Memory device status. Failed memory devices will get this code.

73 DCA MIB information

Table 6.5 Example trap descriptions and severities (continued)

Error

Memory Device 1 Status: critical

Memory Device 5 Status: critical

Info

Memory Device 1 Status: ok

Memory Device 5 Status: ok

Warning

Memory Device 6 Status: nonCritical

Memory Device 1 Status: nonCritical

10.8003 Status of the network device.

11.9001 Status of IO Controller.

Error

Controller 1 Status: Degraded

Info

Controller 1 Status: ok

11.9002 Status of battery on the IO Controller.

Error

Controller Battery 1 Status: critical

Info

Controller Battery 1 Status: ok

Warning

Controller Battery 1 Status: nonCritical

12.10002

Error

Virtual Disk 4 Status: /dev/sdd: critical

Virtual Disk 3 Status: /dev/sdc: critical

Info

Virtual Disk 3 Status: /dev/sdc: ok

Virtual Disk 4 Status: /dev/sdd: ok

Virtual Disk 2 Status: /dev/sdb: ok

Virtual Disk 1 Status: /dev/sda: ok

Warning

Virtual Disk 3 Status: /dev/sdc: nonCritical

Virtual Disk 4 Status: /dev/sdd: nonCritical

74 EMC DCA Getting Started Guide DCA MIB information

Table 6.5 Example trap descriptions and severities (continued)

Virtual Disk 1 Status: /dev/sda: nonCritical

Virtual Disk 2 Status: /dev/sdb: nonCritical

12.10005 Write cache policy on virtual disk. For example, expected to be write back mode.

Info

Virtual Disk 2 Write Policy: /dev/sdb: LSI Write Back

Virtual Disk 3 Write Policy: /dev/sdc: LSI Write Back

Virtual Disk 1 Write Policy: /dev/sda: LSI Write Back

Virtual Disk 4 Write Policy: /dev/sdd: LSI Write Back

Virtual Disk 6 Write Policy: /dev/sdf: LSI Write Back

Virtual Disk 5 Write Policy: /dev/sde: LSI Write Back

Virtual Disk 11 Write Policy: /dev/sdk: LSI Write Back

Virtual Disk 8 Write Policy: /dev/sdh: LSI Write Back

Virtual Disk 13 Write Policy: /dev/sdm: LSI Write Back

Virtual Disk 14 Write Policy: /dev/sdn: LSI Write Back

Virtual Disk 15 Write Policy: /dev/sdo: LSI Write Back

Virtual Disk 7 Write Policy: /dev/sdg: LSI Write Back

Virtual Disk 9 Write Policy: /dev/sdi: LSI Write Back

Virtual Disk 12 Write Policy: /dev/sdl: LSI Write Back

Virtual Disk 10 Write Policy: /dev/sdj: LSI Write Back

Virtual Disk 16 Write Policy: /dev/sdp: LSI Write Back

Warning

Virtual Disk 2 Write Policy: /dev/sdb: LSI Write Through

Virtual Disk 3 Write Policy: /dev/sdc: LSI Write Through

Virtual Disk 1 Write Policy: /dev/sda: LSI Write Through

Virtual Disk 4 Write Policy: /dev/sdd: LSI Write Through

Virtual Disk 2 Write Policy: /dev/sdb: Enabled Always (SAS)

Virtual Disk 1 Write Policy: /dev/sda: Enabled Always (SAS)

Virtual Disk 4 Write Policy: /dev/sdd: Enabled Always (SAS)

Virtual Disk 3 Write Policy: /dev/sdc: Enabled Always (SAS)

Virtual Disk 5 Write Policy: /dev/sde: LSI Write Through

Virtual Disk 6 Write Policy: /dev/sdf: LSI Write Through

Virtual Disk 16 Write Policy: /dev/sdp: LSI Write Through

Virtual Disk 10 Write Policy: /dev/sdj: LSI Write Through

Virtual Disk 12 Write Policy: /dev/sdl: LSI Write Through

75 DCA MIB information

Table 6.5 Example trap descriptions and severities (continued)

Virtual Disk 14 Write Policy: /dev/sdn: LSI Write Through

Virtual Disk 7 Write Policy: /dev/sdg: LSI Write Through

Virtual Disk 9 Write Policy: /dev/sdi: LSI Write Through

Virtual Disk 8 Write Policy: /dev/sdh: LSI Write Through

Virtual Disk 13 Write Policy: /dev/sdm: LSI Write Through

Virtual Disk 15 Write Policy: /dev/sdo: LSI Write Through

Virtual Disk 11 Write Policy: /dev/sdk: LSI Write Through

12.10006 Read cache policy of virtual disk. For example, expected to be adaptive read ahead.

Warning

Virtual Disk 1 Read Policy: /dev/sda: LSI No Read Ahead

Virtual Disk 3 Read Policy: /dev/sdc: LSI No Read Ahead

Virtual Disk 2 Read Policy: /dev/sdb: LSI No Read Ahead

12.10007 Detects offline, rebuilding raid and other unexpected virtual disk states.

Error

Virtual Disk 3 State: /dev/sdc: Degraded

Virtual Disk 4 State: /dev/sdd: Degraded

Virtual Disk 1 State: /dev/sda: Degraded

Virtual Disk 2 State: /dev/sdb: Degraded

Info

Virtual Disk 4 State: /dev/sdd: Ready

Virtual Disk 2 State: /dev/sdb: Ready

Virtual Disk 3 State: /dev/sdc: Ready

Virtual Disk 1 State: /dev/sda: Ready

Warning

Virtual Disk 4 State: /dev/sdd: Background Initialization

Virtual Disk 2 State: /dev/sdb: Background Initialization

Virtual Disk 3 State: /dev/sdc: Background Initialization

Virtual Disk 1 State: /dev/sda: Background Initialization

12.10011 Percentage of disk space on virtual disk used.

Error

Disk Space Used Percentage on Segment (/data1) 2 Value: value 90 outside of range 0 to 89

Disk Space Used Percentage on Segment (/data2) 3 Value: value 90 outside of range 0 to 89

Disk Space Used Percentage on Segment (/data2) 3 Value: value 93 outside of range 0 to 89

Disk Space Used Percentage on Segment (/) 1 Value: value 100 outside of range 0 to 89

76 EMC DCA Getting Started Guide DCA MIB information

Table 6.5 Example trap descriptions and severities (continued)

Info

Disk Space Used Percentage on Segment (/data2) 3 Value: 79

Disk Space Used Percentage on Segment (/data1) 2 Value: 79

Disk Space Used Percentage on Segment (/) 1 Value: 16

Warning

Disk Space Used Percentage on Segment (/data2) 3 Value: value 80 outside of range 0 to 79

Disk Space Used Percentage on Segment (/data1) 2 Value: value 80 outside of range 0 to 79

Disk Space Used Percentage on Master (/) 1 Value: value 84 outside of range 0 to 79

13.11001 Status of drive. Drive failures use this ID.

Error

Array Disk 9 Status: critical

Array Disk 6 Status: critical

Array Disk 10 Status: critical

Info

Array Disk 8 Status: ok

Array Disk 2 Status: ok

Array Disk 3 Status: ok

Warning

Array Disk 5 Status: nonCritical

Array Disk 4 Status: nonCritical

Array Disk 12 Status: nonCritical

Array Disk 11 Status: nonCritical

14.12002 Interconnect Switch Operational Status.

Error

Operational Status: down

Info

Operational Status: ok

Unknown

Operational Status: unexpected status from device

14.13001 Status errors from switch sensors – Fans, Power Supplies, and Temperature.

Error

Sensor 9 Status: failed: Power Supply #2 -- sensor 9: type 5 is faulty, value is 0

Sensor 8 Status: failed: Power Supply #1 -- sensor 8: type 5 is faulty, value is 0

Info

77 DCA MIB information

Table 6.5 Example trap descriptions and severities (continued)

Sensor 9 Status: ok: Power Supply #2 -- sensor 9: type 5 is OK, value is 1

Sensor 8 Status: ok: Power Supply #1 -- sensor 8: type 5 is OK, value is 1

14.14

Unknown

Interface 0 Description: unexpected snmp value: val_len<= 0

14.14001

Unknown

Interface 0 Status: unexpected status from device

15.2 An error detected in the SNMP configuration of the host

Error

Crash files on system: SNMP configuration issue on host

Core files on system: SNMP configuration issue on host

Disk Space Used Percentage on Segment (/data2) Value: SNMP configuration issue on host

Disk Space Used Percentage on Segment (/data1) Value: SNMP configuration issue on host

Cache Device Size: SNMP configuration issue on host

Power Supply Name: SNMP configuration issue on host

Disk Space Used Percentage on Master (/) Value: SNMP configuration issue on host

Power Probe Type: SNMP configuration issue on host

Cooling Device Low critical temp: SNMP configuration issue on host

Virtual Disk Device Name: SNMP configuration issue on host

Network Device Ip Address: SNMP configuration issue on host

Power Supply Volts: SNMP configuration issue on host

Array Disk Name: SNMP configuration issue on host

Network Device Status: SNMP configuration issue on host

System Temperature: SNMP configuration issue on host

Processor Status: SNMP configuration issue on host

Memory Device Status: SNMP configuration issue on host

Controller Name: SNMP configuration issue on host

Disk Space Used on Master in megabytes (/data) Value: SNMP configuration issue on host

OS Memory Status: SNMP configuration issue on host

Virtual Disk Read Policy: SNMP configuration issue on host

Virtual Disk Write Policy: SNMP configuration issue on host

Battery Status: SNMP configuration issue on host

Cache Device Status: SNMP configuration issue on host

78 EMC DCA Getting Started Guide DCA MIB information

Table 6.5 Example trap descriptions and severities (continued)

Power Supply Status: SNMP configuration issue on host

Power Probe Value: SNMP configuration issue on host

Power Probe Name: SNMP configuration issue on host

Virtual Disk State: SNMP configuration issue on host

Controller Battery Status: SNMP configuration issue on host

Cooling Device High critical temp: SNMP configuration issue on host

Controller Battery State: SNMP configuration issue on host

Cooling Device Status: SNMP configuration issue on host

Percentage of idle CPU time: SNMP configuration issue on host

Percentage of user CPU time: SNMP configuration issue on host

Controller Status: SNMP configuration issue on host

RAM available: SNMP configuration issue on host

Network Device Name: SNMP configuration issue on host

Swap space available: SNMP configuration issue on host

Percentage of system CPU time: SNMP configuration issue on host

Swap space total: SNMP configuration issue on host

Cooling Device Name: SNMP configuration issue on host

Upgrade State: SNMP configuration issue on host

RAM total: SNMP configuration issue on host

15.3 Other SNMP-related errors

Error

Power Supply Name: Got unexpected error looking for snmp OID

Virtual Disk Device Name: Got unexpected error looking for snmp OID

Cooling Device High critical temp: Got unexpected error looking for snmp OID

Controller Name: Got unexpected error looking for snmp OID

Disk Space Used on Segment in kilobytes (/data2) Value: Got unexpected error looking for snmp OID

Crash files on system: Got unexpected error looking for snmp OID

Power Probe Value: Got unexpected error looking for snmp OID

Cooling Device Low critical temp: Got unexpected error looking for snmp OID

Cooling Device Status: Got unexpected error looking for snmp OID

OS Memory Status: Got unexpected error looking for snmp OID

Array Disk Status: Got unexpected error looking for snmp OID

Power Supply Volts: Got unexpected error looking for snmp OID

Power Supply Status: Got unexpected error looking for snmp OID

79 DCA MIB information

Table 6.5 Example trap descriptions and severities (continued)

Network Device Name: Got unexpected error looking for snmp OID

Array Disk Name: Got unexpected error looking for snmp OID

Network Device Ip Address: Got unexpected error looking for snmp OID

Cache Device Size: Got unexpected error looking for snmp OID

Controller Status: Got unexpected error looking for snmp OID

Disk Space Used Percentage on Segment (/) Value: Got unexpected error looking for snmp OID

System Temperature: Got unexpected error looking for snmp OID

Battery Status: Got unexpected error looking for snmp OID

Virtual Disk Read Policy: Got unexpected error looking for snmp OID

Disk Space Used Percentage on Segment (/data2) Value: Got unexpected error looking for snmp OID

Cooling Device Name: Got unexpected error looking for snmp OID

Virtual Disk Write Policy: Got unexpected error looking for snmp OID

Virtual Disk State: Got unexpected error looking for snmp OID

Virtual Disk Status: Got unexpected error looking for snmp OID

Memory Device Status: Got unexpected error looking for snmp OID

Network Device Status: Got unexpected error looking for snmp OID

Power Probe Type: Got unexpected error looking for snmp OID

Cache Device Status: Got unexpected error looking for snmp OID

Processor Status: Got unexpected error looking for snmp OID

Power Probe Name: Got unexpected error looking for snmp OID

Core files on system: Got unexpected error looking for snmp OID

Controller Battery State: Got unexpected error looking for snmp OID

Controller Battery Status: Got unexpected error looking for snmp OID

Disk Space Used Percentage on Segment (/data1) Value: Got unexpected error looking for snmp OID

Disk Space Used on Segment in kilobytes (/data1) Value: Got unexpected error looking for snmp OID

Interface Description: Got unexpected error looking for snmp OID

Operational Status: Got unexpected error looking for snmp OID

Disk Space Used on Master in kilobytes (/data) Value: Got unexpected error looking for snmp OID

Disk Space Used Percentage on Master (/data) Value: Got unexpected error looking for snmp OID

Sensor Status: Got unexpected error looking for snmp OID

Interface Status: Got unexpected error looking for snmp OID

15.6 Can not find expected OID during SNMP walk

Error

Power Supply Name: Data not found for expected snmp OID

80 EMC DCA Getting Started Guide DCA MIB information

Table 6.5 Example trap descriptions and severities (continued)

Memory Device Status: Data not found for expected snmp OID

Network Device Ip Address: Data not found for expected snmp OID

OS Memory Status: Data not found for expected snmp OID

Power Supply Status: Data not found for expected snmp OID

Controller Status: Data not found for expected snmp OID

Power Supply Volts: Data not found for expected snmp OID

Virtual Disk Read Policy: Data not found for expected snmp OID

Virtual Disk Write Policy: Data not found for expected snmp OID

Cooling Device Name: Data not found for expected snmp OID

Cooling Device High critical temp: Data not found for expected snmp OID

Cache Device Size: Data not found for expected snmp OID

Power Probe Value: Data not found for expected snmp OID

Cooling Device Low critical temp: Data not found for expected snmp OID

Network Device Status: Data not found for expected snmp OID

Virtual Disk Status: Data not found for expected snmp OID

Virtual Disk Device Name: Data not found for expected snmp OID

Controller Battery State: Data not found for expected snmp OID

System Temperature: Data not found for expected snmp OID

Virtual Disk State: Data not found for expected snmp OID

Battery Status: Data not found for expected snmp OID

Network Device Name: Data not found for expected snmp OID

Cache Device Status: Data not found for expected snmp OID

Array Disk Status: Data not found for expected snmp OID

Controller Name: Data not found for expected snmp OID

Array Disk Name: Data not found for expected snmp OID

Power Probe Name: Data not found for expected snmp OID

Power Probe Type: Data not found for expected snmp OID

Controller Battery Status: Data not found for expected snmp OID

Cooling Device Status: Data not found for expected snmp OID

Processor Status: Data not found for expected snmp OID

Sensor Status: Data not found for expected snmp OID

Sensor Message: Data not found for expected snmp OID

Processor Device Status: Data not found for expected snmp OID

Sensor Name: Data not found for expected snmp OID

81 DCA MIB information

Table 6.5 Example trap descriptions and severities (continued)

Operational Status: Data not found for expected snmp OID

16 Test Dial Home

Error

EMC Connect Test Error Alert

Info

EMC Connect Test Info Alert

18.17 Sent by healthmond when GPDB status is normal.

Info

GPDB Status: GPDB not running

GPDB Status: ok

18.17001 Sent by healthmond when GPDB can not be connected to and was not shutdown cleanly, possible GPDB failure.

Error

GPDB Status: fe_sendauth: no password supplied

GPDB Status: timeout expired

GPDB Status: FATAL: Upgrade in progress, connection refused

GPDB Status: FATAL: no pg_hba.conf entry for host "172.28.10.250", user "gpadmin", database "template1", SSL off

GPDB Status: FATAL: could not open file "global/pg_database": No such file or directory

Connection Status: Unsuccessful

GPDB Status: FATAL: DTM initialization: failure during startup/recovery, retry failed, check segment status (cdbtm.c:1351)

GPDB Status: FATAL: semctl(7241763, 14, SETVAL, 0) failed: Invalid argument (pg_sema.c:154)

GPDB Status: could not connect to server: No route to host Is the server running on host "mdw" and accepting TCP/IP connections on port 5432?

Info

Connection Status: ok

18.17002 Sent by healthmond when detecting a failed segment.

Error

GPDB Status: One or more segments are down

Count of segments down: 6

Count of segments down: 12

Count of segments down: 1

Count of segments down: 4

Count of segments down: 3

Info

82 EMC DCA Getting Started Guide DCA MIB information

Table 6.5 Example trap descriptions and severities (continued)

Count of segments down: 0

18.17006 Sent by healthmond when detecting a move of the master segment from mdw to smdw.

Warning

GPDB has moved to smdw from mdw

18.17007 Sent by healthmond when detecting a move of the master segment from smdw to mdw.

Warning

GPDB has moved to mdw from smdw

18.17008 Sent by healthmond when a query fails during health checking.

Error

GPDB Status: Database mirrors are not in sync with the master

18.17009 Healthmond error querying GPDB State.

Error

GPDB Status: no connection to the server

19.18 ID for informational dial homes with general system usage information.

Info

Informational Dial Home

21.2 Core files were found on the system.

Error

Core files on system: Core files present on system

Info

Core files on system: ok

22.21 Master Node Failover was successful.

Info

Successful

22.21001 GPActivatestandby command failed during master node failover.

Error

Gpactivatestandby failed

22.21003Error in bringing the remote (other) master server down during master node failover.

Error

Could not shutdown remote master

83 Integrate DCA MIB with environment

Integrate DCA MIB with environment This section describes how to integrate the DCA MIB with an environment. It includes the following topics: Change the SNMP community string • Set an SNMP Trap Sink •

Change the SNMP community string The SNMP community string can be modified through the DCA Setup utility. Changing the SNMP community string through DCA Setup will update all hosts in the DCA. Follow the instructions below to modify the community SNMP string. The following restrictions apply when modifying the community SNMP string: The Greenplum Database must be version 4.1.1.3 or later. If the Greenplum Database • is a version earlier than 4.1.1.3, the option to modify the SNMP community string will not be available. If the SNMP community string is modified while running Greenplum Database 4.1.1.3 • or later, and the Greenplum Database is downgraded to a version earlier than 4.1.1.3, the modified SNMP file will not function properly. Also, dial-home and health monitoring will be affected. If the DCA cluster is expanded with new hosts, the new hosts will not use the default • SNMP configuration by default. The updated SNMP configuration must be copied from an existing host to the new hosts.

1. Open an SSH connection to the Primary Master server and log in as the user root.

2. Start the DCA Setup utility: # dca_setup

3. Select option 2 to Modify DCA Settings.

4. Select option 16 for Modify the Health Monitoring Configuration.

5. Select option 6 for Configure the SNMP Community.

6. Enter the new SNMP community string at the following prompt: Would you like to modify the SNMP Community? Current Setting = public. Press Enter to keep this setting.

7. Enter A to apply the above settings.

Set an SNMP Trap Sink You can specify up to 6 SNMP Trap Sink servers through the DCA Setup utility. Follow the instructions below to set Trap Sink servers:

1. Open an SSH connection to the Primary Master server and log in as the user root.

2. Start the DCA Setup utility: # dca_setup

3. Select option 2 to Modify DCA Settings.

4. Select option 16 for Modify the Health Monitoring Configuration.

84 EMC DCA Getting Started Guide General Database Maintenance Tasks

5. Select option 7 for trap hosts.

6. Enter the IP or qualified name of a trap server at the following prompt: Please enter a trap server.

7. Specify if you want to add an additional trap server at the following prompt: Would you like to add another trap host? (Yy|Nn).

8. Enter A to apply the above settings.

General Database Maintenance Tasks Like any database management system, GPDB requires certain tasks be performed regularly to maintain optimum performance. The tasks discussed here are required, but since they are repetitive they can easily be automated using standard UNIX tools such as cron scripts. It is the database administrator’s responsibility to set up appropriate scripts and see that they run successfully. This section contains the following topics: Routine Vacuum and Analyze • Routine Reindexing • Managing GPDB Log Files •

Routine Vacuum and Analyze Because of the multi-version concurrency control (MVCC) transaction model used in GPDB, data rows that are deleted or updated still occupy physical space even though they are not visible to any new transactions. If you have a database with lots of updates and deletes, you will generate a lot of expired rows. Running the VACUUM SQL command will reclaim this disk space. The VACUUM command also collects table-level statistics, such as number of rows and pages, so it is necessary to periodically run VACUUM on all tables.

Transaction ID Management GPDB’s MVCC transaction semantics must be able to compare transaction ID (XID) numbers to determine visibility to other transactions. However, since transaction IDs have limited size, a GPDB system that runs for a long time (more than four billion transactions) would suffer transaction ID wraparound: the XID counter wraps around to zero, so that transactions that occurred in the past appear to occur in the future, which means their outputs become invisible. To avoid this, you must run VACUUM on every table in every database at least once every two billion transactions. For more information, see the DCAv2 Data Computing Appliance Administration Guide.

System Catalog Maintenance System performance can be affected by numerous database updates with the CREATE and DROP commands, which can cause the system catalog to grow. For example, after a large number of DROP TABLE statements, the overall performance of the system can degrade due to excessive data scanning during metadata operations on the catalog tables. Depending on your system, the performance loss can be caused by thousands to tens of thousands of DROP TABLE statements.

GPDB recommends that you periodically run VACUUM on the system catalog to clear the space occupied by deleted objects. If numerous DROP statements are a part of your regular database operations, you can safely run a system catalog maintenance procedure with VACUUM at off-peak hours every day. This can be done while the system is running and available.

85 General Database Maintenance Tasks

The following sample script performs a VACUUM of the GPDB system catalog: #!/bin/bash DBNAME="" VCOMMAND="VACUUM ANALYZE" psql -tc "select '$VCOMMAND' || ' pg_catalog.' || relname || ';' from pg_class a,pg_namespace b where a.relnamespace=b.oid and b.nspname= 'pg_catalog' and a.relkind='r'" $DBNAME | psql -a $DBNAME

Vacuum and Analyze for Query Optimization GPDB uses a cost-based query planner that relies on database statistics. Accurate statistics allow the query planner to better estimate selectivity and the number of rows retrieved by a query operation in order to choose the most efficient query plan. The ANALYZE command collects column-level statistics needed by the query planner. Both VACUUM and ANALYZE operations can be run in the same command. For example: =# VACUUM ANALYZE mytable;

Routine Reindexing For B-tree indexes, a freshly constructed index is somewhat faster to access than one that was updated many times, because logically adjacent pages are usually also physically adjacent in a newly built index. It might help to reindex periodically to improve access speed. Also, if all but a few index keys on a page have been deleted, there is wasted space on the index page that a reindex can reclaim. In GPDB it is often faster to drop an index (DROP INDEX) and recreate it (CREATE INDEX) than to use the REINDEX command. Bitmap indexes are not updated when changes are made to the indexed columns. If you updated a table that has a bitmap index, you must drop and recreate the index for it to remain current.

Managing GPDB Log Files This section contains the following topics: Database Server Log Files • Management Utility Log Files •

Database Server Log Files GPDB log output tends to be voluminous, especially at higher debug levels, and you do not need to save it indefinitely. Administrators need to rotate the log files periodically so that new log files are started and old ones are removed after a reasonable period of time. GPDB has log file rotation enabled on the master and all segment instances. Daily log files are created in pg_log on the master and in each segment data directory using the naming convention gpdb-YYYY-MM-DD.log. Though log files roll over daily, they are not automatically truncated or deleted. Administrators must implement a script or program to periodically delete old log files in the pg_log directory of the master and each segment instance.

86 EMC DCA Getting Started Guide Next Steps

Management Utility Log Files By default, log files for the GPDB management utilities are written to ~/gpAdminLogs, the home directory of the gpadmin user. The naming convention for management log files is _.log. The log file for a particular utility is appended to its daily log file each time that utility is run. Administrators need to implement a script or program to periodically clean up old log files in ~/gpAdminLogs.

Next Steps For information on connecting to a GPDB system running on the DCA, see the DCAv2 Data Computing Appliance Administration Guide available on http://docs.pivotal.io.

87 Next Steps

88 EMC DCA Getting Started Guide 7. Power Down the DCA

To safely shut down and power off DCA hardware and software, perform the following tasks in sequence: Task 1: Connect to the DCA Master Server • Task 2: Stop the GPDB software and shut down the OS • Task 3: Place the PDU power switches in the OFF position •

Stop all running queries and data loading before you power down the DCA.

Task 1: Connect to the DCA Master Server The fastest method to shut down a DCA is to SSH in to a Master Server through an external network connection. If the external connection is not available and you have a service laptop, connect to the DCA as described in this procedure. This procedure assumes you are using the Windows Operating System.

1. Locate the system rack of the DCA. The system rack contains the Primary and Standby Master servers. Master servers are highlighted in red in Figure 7.1.

89 Figure 7.1 Master Servers in the System rack 2. Locate the red service cable on the laptop tray and connect it to your laptop. The red service cable is connected to port 48 on the Administration switch.

3. From your Windows laptop navigate to Start > Control Panel > Network and > Network Sharing Center.

4. On the left pane click Change adapter settings.

5. Right-click Local Area Connection and select Properties.

6. From the Networking tab select Internet Protocol Version 4 (TCP/IPv4).

7. Click Properties.

8. Select Use the following IP address, and then enter the following IP address and subnet mask:

• IP address: 172.28.3.253

90 EMC DCA Getting Started Guide • Subnet mask: 255.255.248.0

9. Click OK.

10. Click Close.

11. Open an SSH client (such as PuTTY) and enter:

• Host Name (or IP address): 172.28.4.250

• Connection type: SSH

12. Click Open. If this is the first time you have connected to this server, a security alert will display.

13. Click Yes to continue.

14. Log in as the user root with password changeme. If the default password changeme was changed, enter the current password.

Task 2: Stop the GPDB software and shut down the OS To ensure data consistency across primary and mirror segments, you must stop the GPDB software correctly.

1. Stop health monitoring as the user root: $ su - # dca_healthmon_ctl -d 2. Stop GPDB: $ gpstop -af 3. Disable Hawq (if applicable) $ ssh hdm2 $ /etc/init.d/hawq stop $ exit 4. Disable Hadoop (if applicable) $ icm_client list $ icm_client stop -l 5. Disable Pivotal Command Center (if applicable) #/etc/init.d/commander stop 6. Stop Command Center: $ gpcmdr --stop

15. Start the DCA Shutdown utility:  Issuing the shutdown command immediately shuts down the DCA. Make sure that you are ready to shut down the DCA before you issue this command.

# dca_shutdown

16. Verify that the green LED on the power button on each server turns off after 1-2 minutes (see Figure 7.2 and Figure 7.3).

91 17. If a server does not power off, power it off manually by pressing the power button. Power button

AF004297

Figure 7.2 Location of power button on a GPDB server (applies also to Hadoop Masters & Workers)

Power button

Figure 7.3 Location of power button on Master, DIA, and Hadoop Compute servers

Task 3: Place the PDU power switches in the OFF position When the GPDB is stopped and the operating system is shut down on each server, it is safe to power off the system via the eight PDU power switches in each rack.

1. Starting from the rear of the System rack (Rack 1), locate the power switches in the upper and lower Power Zones A and B (see Figure 7.4).

92 EMC DCA Getting Started Guide Power Power switches switches

Upper Customer-supplied Zone B power input Upper Zone A Customer- input supplied power

Power Power switches switches

Customer-supplied Customer- power supplied power Lower Zone A Lower input Zone B input

Figure 7.4 Rack power switch locations 2. First place the power switches in lower Power Zones A and B in the OFF position, and then place the power switches in upper Power Zones A and B in the OFF position.

3. Power off the remaining racks in the same way, one rack at a time, first placing the power switches in the lower zone and then the upper zone in the OFF position. After a few seconds, there should be no lit LEDs on any components in the system. Shutdown is complete.

93 94 EMC DCA Getting Started Guide 8. Next Steps

This chapter explains the next steps to implementing your data warehouse requirements in GPDB. It includes the following sections: • Documentation Resources • Providing User Access to GPDB • Creating Databases and Loading Data

Documentation Resources For DCA product-specific documentation, release notes, or software updates, go to the EMC Online Support site at http://support.emc.com, click Support By Product, and search for Data Computing Appliance. For Pivotal product-specific documentation and release notes, go to the Pivotal documentation site at http://docs.pivotal.io.

Providing User Access to GPDB GPDB manages database access permissions using the concept of roles. The concept of roles subsumes the concepts of users and groups. A role can be a database user, a group, or both. Roles can own database objects (for example, tables) and can assign privileges on those objects to other roles to control access to the objects. Roles can be members of other roles, thus a member role can inherit the object privileges of its parent role. Every GPDB system contains a set of database roles (users and groups). The roles are separate from the users and groups managed by the operating system on which the database process runs. However, for convenience you may want to maintain a relationship between operating system user names and GPDB role names, since many of the client applications use the current operating system user name as the default. In GPDB, users log in and connect through the master instance, which then verifies their role and access privileges. In order to bootstrap the GPDB system, a freshly initialized system always contains one predefined superuser role. This role will have the same name as the operating system user that initialized the GPDB system. Customarily, this role is named gpadmin. In order to create more roles you first have to connect as this initial role. See the Pivotal Greenplum Database Administrator Guide for more information on creating additional roles in GPDB.

Creating Databases and Loading Data After establishing your database connections, the next step is to begin creating databases and loading data. For information about creating databases, schemas, tables, and other database objects in GPDB and loading your data, see the Pivotal Greenplum Database Administrator Guide.

Documentation Resources 95 Creating Databases and Loading Data

96 EMC DCA Getting Started Guide A. Red Hat Enterprise Linux End User License Agreement

You can find the Red Hat® Enterprise Linux® and Red Hat Applications End User License Agreement at: http://www.redhat.com/en/about/red-hat-end-user-license-agreements#rhel

97 98 EMC DCA Getting Started Guide Glossary A

append-only tables An append-only (AO) table is a storage representation that allows only appending new rows to a table, but does not allow updating or deleting existing rows. This allows for more compact storage on disk because each row does not need to store MVCC transaction visibility information. This saves 20 per row. AO tables can also be compressed.

array The set of physical devices (hosts, servers, network switches, etc.) used to house a Greenplum Database system.

B

bandwidth Bandwidth is the maximum amount of information that can be transmitted along a channel, such as a network or I/O channel. This data transfer rate is usually measured in megabytes or gigabytes per second (MB/s or GB/s).

BI Business Intelligence (BI) is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data with the goal of helping users make better business decisions.

C

catalog See system catalog.

column-oriented table GPDB provides a choice of storage orientation models for a table: row or column. A column-oriented table stores its content on disk by column rather than by row. This storage model has performance advantages for certain types of queries. Only append-only tables can be column-oriented; heap tables are always row-oriented.

D

Data Computing Data Computing Appliance (DCA) is a self-contained data warehouse solution that Appliance integrates all of the database software, servers and switches necessary to perform big data analytics. DCA is delivered racked and ready for immediate data loading and query execution.

data directory The data directory is the location on disk where database data is stored. The master data directory contains the global system catalog only — no user data is stored on the master. The data directory on the segment instances has user data for that segment plus a local copy of the system catalog. The data directory contains several subdirectories, control files, and configuration files as well.

DCA Data Computing Appliance. See Greenplum GP100.

99 Glossary

distributed Certain database objects in GPDB, such as tables and indexes, are distributed. They are divided into equal parts and spread out among the segment instances based on a hashing . To the end-user and client software, however, a distributed object appears as a conventional database object.

distribution key In a GPDB table that uses hash distribution, one or more columns are used as the distribution key, meaning those columns are used to divide the data among all of the segments. The distribution key should be the primary key of the table or a unique column or set of columns.

distribution policy The distribution policy determines how to divide the rows of a table among the GPDB segments. GPDB provides two types of distribution policy: hash distribution and random distribution.

DDL Data Definition Language. A subset of SQL commands used for defining the structure of a database.

DML Database Manipulation Language. SQL commands that store, manipulate, and retrieve data from tables. INSERT, UPDATE, DELETE, and SELECT are DML commands. E

ELT Extract, load, and transform (ELT) is a process in data warehousing that involves extracting data from outside data sources, loading the raw data into a high-performance database management system (such as GPDB), and then performing the data transformations within the database itself.

ETL Extract, transform, and load (ETL) is a process in data warehousing that involves extracting data from outside data sources, transforming it to meet the operational requirements of the data warehouse, and loading it into the target database.

G

gang For each slice of the query plan there is at least one query executor worker process assigned. During query execution, each segment will have a number of processes working on the query in parallel. Related processes that are working on the same portion of the query plan on different segments are referred to as gangs. Greenplum Database Greenplum Database (GPDB) is the industry’s first massively parallel processing (MPP) database server based on open-source technology. It is explicitly designed to support business intelligence (BI) applications and large, multi-terabyte data warehouses. GPDB is based on PostgreSQL.

Greenplum Database An associated set of segment instances and a master instance running on an array, which system can be composed of one or more hosts.

Greenplum GP100 The model name of the Greenplum GP100 half rack solution.

Greenplum GP1000 The model name of the Greenplum GP100 full rack solution.

Greenplum instance The process that serves a database. An instance of GPDB is comprised of a master instance and two or more segment instances, however users and administrators always connect to the database via the master instance.

100 EMC DCA Getting Started Guide Glossary

GP100 See Greenplum GP100.

GP1000 See Greenplum GP1000.

H hash distribution With hash distribution, one or more table columns is used as the distribution key for the table. The distribution key is used by a hashing algorithm to assign each row to a particular segment. Keys of the same value will always hash to the same segment.

heap tables Whenever you create a table without specifying a storage structure, the default is a heap storage structure. In a heap structure, the table is an unordered collection of data that allows multiple copies or versions of a row. Heap tables have row-level versioning information and allow updates and deletes. See also append-only tables and multiversion concurrency control.

host A host represents a physical machine or compute node in a GPDB system. In GPDB, one host is designated as the master. The other hosts in the system have one or more segments on them.

I

interconnect The interconnect is the networking layer of GPDB. When a user connects to a database and issues a query, processes are created on each of the segments to handle the work of that query. The interconnect refers to the inter-process communication between the segments and master, as well as the network infrastructure on which this communication relies.

I/O Input/Output (I/O) refers to the transfer of data to and from a system or device using a communication channel.

Isilon EMC Isilon is scale-out NAS storage that provides a powerful, simple, efficient way to consolidate and manage enterprise data and applications

J

JDBC Java Database Connectivity is an application program interface (API) specification for connecting programs written in Java to data in a database management system (DBMS). The application program interface lets you encode access request statements in SQL that are then passed to the program that manages the database.

M

master The master is the entry point to a GPDB system. It is the database listener process (postmaster) that accepts client connections and dispatches the SQL commands issued by the users of the system. The master is where the global system catalog resides. However, the master does not contain any user data. User data resides only on the segments. The master does the work of authenticating user connections, parsing and planning the incoming SQL commands, distributing the query plan to the segments for execution, coordinating the results returned by each of the segments, and presenting the final results to the user.

master instance The database process that serves the GPDB master. See master. Glossary

mirror A mirror is a backup copy of a segment (or master) that is stored on a different host than the primary copy. Mirrors are useful for maintaining operations if a host in your GPDB system fails. Mirroring is an optional feature of GPDB. Mirror segments are evenly distributed among other hosts in the array. If a host that holds a primary segment fails, GPDB will switch to the mirror or secondary host.

motion node A motion node is a portion of a query execution plan that indicates data movement between the various database instances of GPDB (segments and the master). Some operations, such as joins, require segments to send and receive tuples to one another in order to satisfy the operation. A motion node can also indicate data movement from the segments back up to the master.

MPP Massively Parallel Processing. multiversion concurrency Unlike traditional database systems which use locks for concurrency control, GPDB (as control does PostgreSQL) maintains data consistency by using a multiversion model (multiversion concurrency control or MVCC). This means that while querying a database, each transaction sees a snapshot of data which protects the transaction from viewing inconsistent data that could be caused by (other) concurrent updates on the same data rows. This provides transaction isolation for each database session. MVCC, by eschewing explicit locking methodologies of traditional database systems, minimizes lock contention in order to allow for reasonable performance in multiuser environments. The main advantage to using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.

MVCC See multiversion concurrency control.

O

ODBC Open Database Connectivity, a standard database access method that makes it possible to access any data from any client application, regardless of which database management system (DBMS) is handling the data. ODBC manages this by inserting a middle layer, called a database driver, between a client application and the DBMS. The purpose of this layer is to translate the application’s data queries into commands that the DBMS understands.

OLAP Online Analytical Processing (OLAP) is a category of technologies for collecting, managing, processing and presenting multidimensional data for analysis and management. OLAP leverages existing data from a relational schema or data warehouse (data source) by placing key performance indicators (measures) into context (dimensions). As of release 3.1, OLAP functions are supported in GPDB. In practice, OLAP functions allow application developers to compose analytic business queries more easily and more efficiently. For example, moving averages and moving sums can be calculated over various intervals; aggregations and ranks can be reset as selected column values change; and complex ratios can be expressed in simple terms.

OLTP Online Transactional Processing (OLTP) is a mode of database processing involving single, small updates from end-point applications and real-time transactional systems.

102 EMC DCA Getting Started Guide Glossary

P partitioned tables Partitioning is a way to logically divide the data in a table for better performance and easier maintenance. In GPDB, partitioning is a procedure that creates multiple sub-tables (or child tables) from a single large table (or parent table). The primary purpose is to improve performance by scanning only the relevant data needed to satisfy a query. Note that partitioned tables are also distributed.

Perl DBI Perl Database Interface (DBI) is an API for connecting programs written in Perl to database management systems (DBMS). Perl DBI (DataBase Interface) is the most common database interface for the Perl .

Pivotal Hadoop Pivotal HD Enterprise is an enterprise-capable, commercially supported distribution of Apache Hadoop 2.2 packages targeted to traditional Hadoop deployments

PostgreSQL PostgreSQL is a SQL compliant, open source relational database management system (RDBMS). GPDB uses a modified version of PostgreSQL as its underlying database server. For more information on PostgreSQL go to http://www.postgresql.org.

postgresql.conf The server configuration file that configures various aspects of the database server. This configuration file is located in the Data Computing Appliance of the database instance. In GPDB, the master and each segment instance has its own postgresql.conf file.

postgres process The postgres is the actual PostgreSQL server process that processes queries. The database listener postgres process (also known as the postmaster) creates other postgres subprocesses as needed to handle client connections.

postmaster In releases prior to GPDB 3.2 and PostgreSQL 8.2, the database listener process was called postmaster. The postmaster process was renamed to postgres process in GPDB 3.2 and PostgreSQL 8.2, however many users who are familiar with PostgreSQL still refer to the database listener process as the postmaster. In GPDB, there is a postgres database listener process for the master instance and each segment instance.

psql This is the interactive terminal to PostgreSQL and GPDB. You can use psql to access a database and issue SQL commands.

Q

QD See query dispatcher.

QE See query executor.

query dispatcher The query dispatcher (QD) is a process that is initiated when users connect to the master and issue SQL commands. This process represents a user session and is responsible for sending the query plan to the segments and coordinating the results it gets back. The query dispatcher process spawns one or more query executor processes to assist in the execution of SQL commands.

query executor A query executor process (QE) is associated with a query dispatcher (QD) process and operates on its behalf. Query executor processes run on the segment instances and execute their slice of the query plan on a segment. Glossary

query plan A query plan is the set of operations that GPDB will perform to produce the answer to a given query. Each node or step in the plan represents a database operation such as a table scan, join, aggregation or sort. Plans are read and executed from bottom to top. GPDB supports an additional plan node type called a motion node. See also slice.

R

rack A type of shelving to which components can be attached vertically, one on top of the other. Components are normally screwed into front-mounted, tapped metal strips with holes which are spaced so as to accommodate the height of devices of various U-sizes. Racks usually have their height denominated in U-units.

RAID Redundant Array of Independent (or Inexpensive) Disks. RAID is a system of using multiple hard drives for sharing or replicating data among the drives. The benefit of RAID is increased , fault-tolerance and/or performance. Multiple hard drives are grouped and seen by the OS as one logical hard drive.

RAM Random Access Memory. The main memory of a computer system used for storing programs and data. RAM provides temporary read/write storage while hard disks offer semi-permanent storage.

random distribution With random distribution, table rows are sent to the segments as they come in, cycling across the segments in a round-robin fashion. Rows with columns having the same values will not necessarily be located on the same segment. Although a random distribution ensures even data distribution, there are performance advantages to choosing a hash distribution policy whenever possible.

S

segment A segment represents a portion of data in a GPDB database. User-defined tables and their indexes are distributed across the available number of segment instances in the GPDB system. Each segment instance contains a distinct portion of the user data. A primary segment instance and its mirror both store the same segment of data.

segment instance The segment instance is the database server process (postmaster) that serves segments. Users do not connect to segment instances directly, but through the master.

server See host.

slice In order to achieve maximum parallelism during query execution, GPDB divides the work of the query plan into slices. A slice is a portion of the plan that can be worked on independently at the segment level. A query plan is sliced wherever a motion node occurs in the plan, one slice on each side of the motion. Plans that do not require data movement (such as catalog lookups on the master) are known as single-slice plans.

star schema A relational database design often used in data warehousing. The star schema is organized around a central table (fact table) joined to a few smaller tables (dimension tables) using references. The fact table contains raw numeric items that represent relevant business facts (price, number of units sold, etc.).

system catalog The system catalogs are the place where a relational database management system stores schema metadata, such as information about tables and columns, and internal bookkeeping information. The system catalog in GPDB is the same as the PostgreSQL

104 EMC DCA Getting Started Guide Glossary

catalog with some additional tables to support the distributed nature of the GPDB system and databases. In GPDB, the master contains the global system catalog tables. The segments also maintain their own local copy of the system catalog.

T tuple A tuple is another name for a row or record in a relational database table.

W

WAL Write-Ahead Logging (WAL) is a standard approach to transaction logging. WAL’s central concept is that changes to data files (where tables and indexes reside) are logged before they are written to permanent storage. Data pages do not need to be flushed to disk on every transaction commit. In the event of a crash, data changes not yet applied to the database can be recovered from the log. A major benefit of using WAL is a significantly reduced number of disk writes.