Best Practices for Deploying NAS

IncludingEMC Proven EMC Professional Proven™ ProfessionalKnowledge Sharing Certification 2009

Ron Nicholl Systems Engineer Sr. [email protected] Large IT Division Table of Contents

Introduction...... 3 Celerra Backend Storage Layout and Configuration...... 4 Fibre Channel Drives ...... 5 CLARiiON ATA Drives...... 5 CLARiiON Initiator Registration ...... 7 Symmetrix®...... 8 Symmetrix Fibre Adaptor Requirements...... 8 Importing Devices into the Celerra Gateway...... 9 Implementing a Reliable and Flexible Network Configuration...... 10 Planning Windows Domain Interaction...... 15 Best Practices for Backing up the Celerra Environment ...... 17 Quick Reference for adding a Tape Device to the Celerra ...... 20 Monitoring the Performance of the Celerra Solution ...... 21 Conclusion ...... 24 Additional Resources...... 24 Biography...... 24

Disclaimer: The views, processes or methodologies published in this article are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes or methodologies.

2009 EMC Proven Professional Knowledge Sharing 2

Introduction

Deploying Celerra® Network Attached Storage (NAS) involves many pieces of the Information Technology (IT) infrastructure ranging from backend storage, network topology, and beyond. Choosing a solid design can make the difference between mediocre performance and exceeding expectations.

NAS solutions are quickly becoming a viable alternative to mitigate the cost of a wholly (SAN) based storage solution. A Celerra NAS solution can offer much of the same functionality seen traditionally on the storage array over Internet Protocol (IP). This functionality includes replication, checkpoints, mirroring and more. You will improve performance and reliability by applying best practices to your design.

This article will describe:

1. How to layout the backend storage devices 2. How to implement a network configuration for greater flexibility and reliability 3. Planning Windows domain interaction 4. Best practices for backing up the Celerra environment 5. Monitoring the performance of the Celerra solution

There are many components to a successful NAS design. You can leverage the network topology to provide greater scope of service. You can configure the Common Internet File System (CIFS) and network file systems (NFS) clients for greater performance and reliability. Applying standards can lower cost of ownership and improve reliability. This information will help you to optimally configure the Celerra.

2009 EMC Proven Professional Knowledge Sharing 3 Celerra Backend Storage Layout and Configuration

Layout and configuration of the backend storage is the first consideration to optimize the Celerra. Celerra Gateway is the focus. The integrated Celerras are configured at the time of installation. Pre-defined storage templates that have been tested to uniformly balance performance and availability are applied.

The CLARiiON® backend is the most common and cost effective primary storage used in conjunction with the Celerra. You must analyze existing workloads before any CLARiiON can be considered. Individual storage processor (SP) utilization should be well under 50% to accommodate existing performance levels as well as the addition of new NAS activity. Dedicating RAID Groups (RG) to NAS workloads is recommended. Competing workloads will impact performance and make performance estimates difficult; the more spindles you can dedicate to NAS the better.

Use the following general settings when creating new Logical Unit Numbers (LUNs):

LUN ID: Select the first available value Element Size: 128 Rebuild Priority: ASAP Verify Priority: ASAP Enable Read Cache: Selected Enable Write Cache: Selected Enable Auto Assign: Cleared (off) Number of LUNs to Bind: (see drive type and RAID configuration) Alignment Offset: 0 LUN size: Must not exceed 2 TB

2009 EMC Proven Professional Knowledge Sharing 4 CLARiiON Fibre Channel Drives RAID Types: 4+1 or 8+1 RAID 5, 1+1 RAID 1, No restriction RAID 6

Create two LUNs of equal size using the total sum of the RG. Assign LUN to the opposite default owner SP to equally utilize both SPs.

CLARiiON ATA Drives RAID Types: 4+1 or 8 + 1 RAID 5, No restriction RAID 6, 4+1 RAID 3 (create one LUN using the entire RG). We recommend RAID 3 for ATA drives with performance coming close to FC for sequential workloads.

Create two LUNs using the entire contents of the RG in an 8+1 or a single LUN in a 4+1 RG. Each LUN within the RG needs the same default owner SP to provide the best performance. Only one SP can address the RG at a time. Alternating LUN ownership by RG will evenly distribute utilization across the storage processors.

2009 EMC Proven Professional Knowledge Sharing 5

Once the RAID groups have been carved, create the storage group and select the new LUNs. It’s important to change the Host ID (HLU) to 16 or above when adding LUNs to a storage group. The CLARiiON will default to starting with zero for the first LUN added to the storage group. HLU’s 0 -16 are reserved for the Celerra’s internal control volumes. Avoid setting the HLU below 16 because it can cause serious issues.

Illustration follows on next page.

2009 EMC Proven Professional Knowledge Sharing 6

CLARiiON Initiator Registration

Manually register the initiators once the Celerra Gateway has been zoned to the CLARiiON. Always use redundant zoning practices.

CLARiiON initiator settings: Hostname: Celerra Gateway Name IP Address: 192.168.1.2 (The data movers do not have a CLARiiON agent. I have chosen the internal IP address for each data mover) Initiator Type: CLARiiON Open Array CommPath: Disabled Failover Mode: 0 Unit Serial Number: Array

2009 EMC Proven Professional Knowledge Sharing 7

Symmetrix®

The Symmetrix platform can be used for data volumes and is generally recommended as the location for the Celerra’s internal control volumes.

We do not recommend the use of Symmetrix meta volumes. Instead, use standard hyper volumes. The Celerra can use standard mirrors, parity RAID, and RAID 5. Standard mirrors are typically used for the highest possible performance. It is important to choose a consistent protection standard if you plan to use the Celerra Automatic Volume Management (AVM). The Celerra will compile all of the devices into a single AVM pool “symm_std.”

Just like the CLARiiON, spindle count is important with the Symmetrix. Keeping the hyper sizes small and uniform helps meet this requirement to realize the best performance. Create devices in multiples of eight for use with AVM. If multiples of eight are not possible, use even numbers (8, 4, and 2.) The hyper emulation type must be CELERRA_FBA.

An example follows: symconfigure -sid XXX -f filename -nop -v (preview, prepare, commit)

Device File: create dev count=X, size=12394, emulation=CELERRA_FBA, config=2-way-mir;

Symmetrix Fibre Adaptor Requirements

The Celerra requires that Fibre Adaptor (FA) ports have specific configurations. The ports should not be shared with other host types due to these specific FA requirements.

Validate current port configuration settings.

2009 EMC Proven Professional Knowledge Sharing 8 Specifically check the following:

Protocol Flag: State:

Common_Serial_Number Disabled

Avoid_Reset_Broadcast Enabled

Volume_Set_Addressing Disabled

symconfigure -sid XXX -f filename -nop -v (preview, prepare, commit)

Device File set port XX:X attribute=enable|disable;

Importing Devices into the Celerra Gateway

Import the devices once they have been created and zoned to the Celerra Gateway using Celerra Manager. Expand the Storage folder and then Systems. You will see a list of the storage systems imported into the Gateway and their health status. Below that is the “rescan” button. This will initiate the scan and subsequent import of new devices across each data mover including the standby.

Run the following command on a per data mover basis. server_devconfig server_x –p – scsi –disks to preview what devices will be added during the import process.

Example of the output truncated: server_2 : SCSI disk devices : chain= 58, scsi-58 : no devices on chain chain= 59, scsi-59 : no devices on chain chain= 60, scsi-60 : no devices on chain chain= 61, scsi-61 : no devices on chain chain= 62, scsi-62 : no devices on chain chain= 63, scsi-63 : no devices on chain

2009 EMC Proven Professional Knowledge Sharing 9 chain= 64, scsi-64 stor_id= APM00000000000 celerra_id= 0000000000000000 tid/lun= 1/0 type= disk sz= 102399 val= 7 info= DGC RAID 5 02241200100012NI tid/lun= 1/1 type= disk sz= 102399 val= 8 info= DGC RAID 5 02241300110013NI tid/lun= 1/2 type= disk sz= 102399 val= 9 info= DGC RAID 5 02242800120028NI tid/lun= 1/3 type= disk sz= 102399 val= 10 info= DGC RAID 5 02242900130029NI

If you are adding a new CLARiiON array to an existing Celerra gateway, update /etc/hosts on the control station with the new CLARiiON information before you rescan. The Celerra control station will regularly talk to the CLARiiON SPs for health and performance information.

Example: Append the /etc/hosts file using the following format. A_APM # CLARiiON SP B_APM # CLARiiON SP

The Celerra will create a new storage pool based on the new storage type and protection level. If you are adding additional LUNs or hyper volumes from an array the Celerra is already using, the existing storage pool will be added.

Implementing a Reliable and Flexible Network Configuration

Implementing a network configuration that provides flexibility and reliability depends on your use case. This section covers the Celerra high availability network configurations.

Balancing your solution’s cost with availability requirements is a challenge in most environments. Both EtherChannel and Link Aggregation are available with the Celerra. Both require that the network infrastructure meets specific qualifications before they can be used. They offer higher availability than any single port with the potential of greater total throughput. You should not expect an increase in bandwidth for any single host.

2009 EMC Proven Professional Knowledge Sharing 10 EtherChannel is a port trunking technology that allows grouping of multiple physical ports into one larger trunk. Greater bandwidth and availability are the benefits. EtherChannel requires ports in multiples of two, and a total of eight, to create a new trunked interface. The interface can be combined with a fail safe network (FSN) to provide even greater availability. The Celerra requires that we configure the connected network switch ports for EtherChannel, primarily seen in Cisco switches. Statistical load balancing distributes the traffic between ports.

Three options are available to determine how traffic is distributed to the trunk members:

 Media Access Control Address (MAC): a port is chosen based on source and destination MAC addresses  IP: the default indicating that source and destination IP addresses are used to direct traffic to the trunk members  Transmission Control Protocol (TCP): the output traffic port is directly based on the source and destination IP addresses.

Example: How to create a trunked interface. (The double quotes are required when using the device options.) server_sysconfig -virtual -name -create trk -option "device= lb=

Link aggregation is the second method to create a highly available network by trunking. The Celerra’s link aggregation is based on the IEEE 802.3ad Link Aggregation Control Protocol (LACP) This standard allows combining similar physical ports into one logical port. This protocol gives the administrator a unique opportunity to use an odd number of ports up to a maximum of twelve. The protocol will look at all connected ports and determine the number and speed of each connection. The majority wins in this case, even if it’s not the fastest available connection.

2009 EMC Proven Professional Knowledge Sharing 11 The odd connection(s) are not allowed to be part of the aggregation. The Celerra interface will appear as a trunk with frames being transmitted on each link. The network switch must support LACP for the protocol to work correctly. The same load balancing options used for Ethernet Channel are applied to link aggregation (MAC, IP, TCP). LACP can be combined with FSN to provide an extra layer of availability.

Example: How to create a trunk using the LACP protocol. server_sysconfig -virtual -name -create trk -option "device= protocol=lacp lb=load balance type"

The Fail Safe Network (FSN) is a virtual device that can add an additional layer of availability for the Celerra network interfaces. An FSN is switch independent and is maintained within the Celerra. It has a primary and a passive secondary physical connection. The link status of the primary is monitored for interruption. If the primary link goes down, everything is transferred to the standby including the MAC address. It’s a great way to have switch fault tolerance and can be combined with EtherChannel and Link aggregation.

Example: How to create an FSN. Server_sysconfig –virtual –name -create fsn –option “dev=,

The following diagram (on the next page) combines a highly available trunked interface and an FSN. This offers the greatest protection to maintain Celerra access to the network. An increase in client performance should not be expected.

2009 EMC Proven Professional Knowledge Sharing 12 Data Mover Switches Network Clients

FSN Trunk ` A

Ethernet ` B

Jumbo frames can improve Celerra network performance at the interface level. Jumbo frames change the Maximum Transmission Unit (MTU) of the network frame from 1500 bytes up to 9000 bytes. A single 9K jumbo frame replaces six standard 1.5k frames.

The overhead savings can reduce CPU utilization and increase network bandwidth. Jumbo frames are better suited to larger I/Os. The bad news is that jumbo frames require every component of the network frames path to support the larger MTU size. The fact that every component must support jumbo frames relegates their use largely to the data center. They are great for server to server transfers if the requirements are met. The result of using jumbo fames without meeting the requirements will mean data layer fragmentation and reassembly penalties. Network performance can be adversely impacted by slowing routers and switches.

Example: How to change the Celerra interface to transmit and receive jumbo frames. server_ifconfig mtu=9000

2009 EMC Proven Professional Knowledge Sharing 13 All of the configurations listed use some of the same basic command sets. The two most common are checking port and virtual interface configuration.

Example: How to check port configuration server_sysconfig –pci Slot: 2 Broadcom Gigabit Ethernet Controller 0: cge2 IRQ: 20 speed=1000 duplex=full txflowctl=disable rxflowctl=disable

1: cge3 IRQ: 21 speed=auto duplex=auto txflowctl=disable rxflowctl=disable

Example: How to check virtual interface information server_ifconfig –all server_2 : protocol=IP device=fsn0 inet= netmask= broadcast= UP, ethernet, mtu=1500, vlan=0, macaddr=0:60:16:1d:1e:c8 loop protocol=IP device=loop inet=127.0.0.1 netmask=255.0.0.0 broadcast=127.255.255.255 UP, loopback, mtu=32768, vlan=0, macaddr=0:0:0:0:0:0 netname=localhost el31 protocol=IP device=mge1 inet=192.168.2.2 netmask=255.255.255.0 broadcast=192.168.2.255 UP, ethernet, mtu=1500, vlan=0, macaddr=0:60:16:19:a0:5 netname=localhost el30 protocol=IP device=mge0 inet=192.168.1.2 netmask=255.255.255.0 broadcast=192.168.1.255 UP, ethernet, mtu=1500, vlan=0, macaddr=0:60:16:19:a0:6 netname=localhost

2009 EMC Proven Professional Knowledge Sharing 14 Planning Windows Domain Interaction

Planning Windows domain interaction is an important factor when setting up the Celerra for CIFS access. You must consider many variables including multiple domain interaction, trust relationships, and access considerations.

It’s difficult to plan for every use case for the Celerra. It is important to define what environments will need access to the data mover. Map out the domains in your environment including trust relationships. Trust relationships will dictate cross domain access.

An example: Clients in the Production A and B environment have access to all production and development domains. Clients in the Development C domain will be unable to access CIFS servers in production A or B.

Development C Production A Production B

Example: How to join a CIFS server to a domain Server_cifs -Join (or –Unjoin)=,domain=,admin=,ou=”ou=

Include Virtual Data Movers (VDM) if you have any plans to replicate a CIFS file system. A VDM is how Celerra creates a container to allow the CIFS configuration, audit logs, credentials and Kerberos to be replicated to a destination Celerra.

Example: How to create a new VDM with CIFS running nas_server -name -type vdm -create

2009 EMC Proven Professional Knowledge Sharing 15 Consider your organizations’ security and standards when creating new CIFS servers. It’s possible to have a CIFS server from a development and production environment connected to the same file system within the data mover. The default behavior when creating a new CIFS share is to allow all CIFS servers on the same data mover access to the file system. A best practice is to keep production and development activities separate. Some sensitive environments also prohibit different networks from touching the same piece of hardware.

It’s always important to keep system times current, but with CIFS it’s critical because of the Windows Kerberos authentication protocol. Created by the Massachusetts Institute of Technology and adopted by Windows, Kerberos facilitates a ticketing system for authentication. When a CIFS user connects, the Kerberos Distribution Center (DC) issues an access ticket of sorts that is time sensitive. If the time between the DC and the data mover skews greater than the default five minutes you will have access issues.

Example: How to start the Network Time Protocol (NTP) on a data mover to keep time up to date. server_date timesvc start ntp –i

The Domain Name System (DNS) relates host names to IP addresses. The Celerra DNS provides connectivity to the DNS lookup servers in the environment. Caching of previous lookups improves the lookup performance of clients accessing the Celerra. The Celerra allows up to three DNS lookup servers per domain on each data mover.

Example: How to configure DNS on the data mover. server_dns

Distributed File System (DFS) is a Windows service that allows for consolidation and portability of file systems without client interaction. DFS is the ‘middleman’ for CIFS share points. You realize some of the benefits during file system migrations. Without DFS, every client would have to be updated with a new share point every time a path changes. DFS centralizes and automates this; the Windows DFS client checks with the DFS server for path changes. Your mapped drive never needs to change.

2009 EMC Proven Professional Knowledge Sharing 16 Best Practices for Backing up the Celerra Environment

Protecting data in a NAS environment presents unique challenges. The Celerra can support both NFS and CIFS permission sets. The traditional methods of backing up file systems via CIFS or NFS no longer provides complete protection.

The Network Data Management Protocol (NDMP) bridges the gap in backing up multiprotocol file systems. NDMP requires a minimum of an NDMP host (data mover), NDMP client with backup software, and tape drive/tape. The flow of data starts with the backup software initiating a backup. A request is sent over the network to the NDMP host to begin the backup. The data mover sets up a new session. The data mover then begins the backup, and the data flows from the storage unit to the data mover and back out the AUX fibre port to the tape drive. The only traffic flowing over the network is control, catalog, and tape library commands.

Data Mover/ NDMP Control NDMP Host Data Over Ethernet

Robotic Control/NDMP Data Stream

NDMP Client with Backup Tape Drive Software

Storage System

2009 EMC Proven Professional Knowledge Sharing 17 The Celerra supports a number of NDMP options such as Direct access restore (DAR), NDMP with integrated SNAP technology, and Volume based backups (VBB). DAR allows a restore to begin from the location on the tape where the data resides. In a basic NDMP restore, the data on the tape is read from the beginning to the end until the restore point is found. This will significantly extend the total restore time. DAR works by sending historical and tape marker information to the backup software. The biggest drawback to DAR is the drop in backup performance due to the additional overhead. The restore times will be significantly improved with the use of DAR.

NDMP with integrated SNAP technology helps ensure consistent backups with custom scripting. This feature automates the creation of snap shots of file systems and the subsequent NDMP backup. The SNAP sessions are automatically created and destroyed regardless of whether the backup completes successfully or not. Your backup software must support this option. Always consider the snap session limits of the Data Access Real Time (DART) code you are running and the available storage.

VBB is a relatively new function of the Celerra NDMP backups. This option allows for a block level backup as opposed to the file system level NDMP backup. VBB will backup an entire meta-volume. It is suited to file systems that have been created using dedicated volumes. AVM will take slices from multiple different meta-volumes. VBB supports both full and incremental backups. A read only file system is required, but the use of snapshots or Celerra TimeFinder® operations for Symmetrix based storage can facilitate this need. Directory and file level filtering are not supported.

The NDMP protocol is a powerful tool for backing up the Celerra. There are several points to monitor within the environment to maintain performance. The data that is being backed up via NDMP travels to the tape device over fibre channel. The control data for the backup stream is sent to the host running the backup software over the Ethernet network. Validate that the server running the backup software has sufficient memory, CPU, and network bandwidth to keep up with the NDMP stream. Using DAR will impact system performance with the additional catalog information that must be created and stored. If you are using 3-Way NDMP backups, be sure to have sufficient capacity and error-free links between the primary and secondary data movers.

2009 EMC Proven Professional Knowledge Sharing 18 Backup software vendors often require you to have NDMP dedicated tape devices. This requirement can reduce the flexibility of your current backup infrastructure. When possible, sharing NDMP tape drives between your Celerra data movers will maintain a lower cost solution and have the benefit of pushing the tape drives into streaming the data to tape. If the drives are not pushed hard enough, you may “shoe shine” the tape device. The tape drive will constantly be stopping and starting as it waits for the tape buffers to fill.

The Celerra backend will also impact your NDMP performance. The backups of file systems residing on the same spindles can cause contention on the backend storage. Mapping out where the file systems reside on the backend will allow you to strategically create backup policies. A tool called wherisfs is located on the control station to help map out where the file system resides.

Quick reference for adding a tape device to the Celerra follows on the next page.

2009 EMC Proven Professional Knowledge Sharing 19 Quick Reference for adding a Tape Device to the Celerra The same steps must be completed for every data mover that is zoned to a tape device.

1. Verify the data mover can see the tape device after zoning is complete. Probe the data mover for any attached non-disks (gatekeepers will also show as non-disks.) server_devconfig -p -s -n

chain= 32, scsi-32 stor_id= celerra_id= tid/lun= 0/0 type= tape info= IBM ULT3580-TD4 0000 tid/lun= 0/1 type= jbox info= IBM 00000000 000000000000000000 0

2. Update the Celerra device CAM tables importing the new device. server_devconfig -c -s -n

Discovering storage (may take several minutes) server_2 : done

3. Validate the import was successful. List out the Celerra active device table - server_devconfig -l -s –n server_2 : Scsi Device Table name addr type info gk161 c16t0l15 disk 567079440360SI000187461179 tape1 c32t0l0 tape IBM ULT3580-TD4 0000 jbox2 c32t0l1 jbox IBM 00000000 000000000000000000 0 gk641 c64t1l0 disk 5 021604001000APM00052502173 gk961 c96t1l0 disk 5 021604001000APM00052502173 gk01 c0t0l15 disk 567079440450SI000187461179

2009 EMC Proven Professional Knowledge Sharing 20 Monitoring the Performance of the Celerra Solution

There are several methods to monitor the Celerra’s performance. The following information covers the Celerra Manager down to device level reporting in the Celerra Monitor, and the available Performance Manager within EMC ControlCenter®.

The Celerra Manager is the most basic component for managing and monitoring the Celerra. The Celerra Manager provides basic statistics on data mover and network utilization to give you a high level view of the Celerra’s health.

Data Mover Utilization Stats

The Celerra Manager also provides the ability to easily setup notifications via email or Simple Network Management Protocol (SNMP) traps on data mover memory and CPU limits. Monitoring storage pool utilization is just as important as performance. Start looking at utilization of 80% or more. This will give you time to plan and implement additional storage capacity.

The Celerra Monitor provides a comprehensive graphical for viewing detailed performance and configuration information. The first thing you need is the Advanced License Edition of the Celerra Manager. The Celerra Monitor is used to display data mover statistics including detailed , server, volume, and storage-system configurations. The Celerra monitor can even provide statistics on individual CIFS and NFS calls. The biggest drawback to the Celerra monitor is the lack

2009 EMC Proven Professional Knowledge Sharing 21 of long term historical data. The Celerra Monitor is not intended for long term performance data archive and retrieval.

The Performance Manager is a component of EMC ControlCenter and provides a powerful tool to monitor both the long and short term performance of multiple components of the storage environment. The benefit to Celerra performance is the ability to pull long term statistics on multiple key performance points and quickly correlate them with backend storage statistics. It’s common for a performance issue to appear only during monthly or quarterly cycles and be non-existent during day to day operations. Performance Manager gives you the long term data you need to make those correlations.

Celerra Statistics on Data Mover Network Performance

2009 EMC Proven Professional Knowledge Sharing 22

CLARiiON Performance Statistics on a RAID Group used by the Celerra.

The wherisfs script is a quick and easy tool that is already on the Celerra control station. This script provides details on what devices are being used by what file systems. In a CLARiiON backend, you will also get a report on what RAID groups compose a file system. It’s a great way to map out the Celerra backend usage. If you have identified a hot disk or RG through Performance Manager or Navisphere® Analyzer it’s a short step to find out what file systems are impacted.

Location: /nas/tools/.whereisfs usage: whereisfs {-all|-ld|-rg} [-ld|-rg] Options: -all in case of CLARiiON backend prints all 3 CLARiiON reports in case of Symmetrix backend prints Symmetrix reports -ld in case of CLARiiON backend prints the CLARiiON report with LUNs and DVOLs -rg in case of CLARiiON backend prints the CLARiiON report with RAID Groups

2009 EMC Proven Professional Knowledge Sharing 23 Conclusion

Deploying Celerra NAS requires many considerations. Those considerations involve multiple areas of IT expertise including network, SAN, Windows, and UNIX. You can implement a cost effective and flexible NAS solution with detailed planning and sound best practices.

Additional Resources

The following EMC Whitepapers are available on Powerlink to assist with the topics covered in the document:

Managing Celerra Volumes and File Systems with Automatic Volume Management Configuring and Managing Celerra Network High Availability Configuring NDMP Backups on Celerra Configuring CIFS on Celerra http://www.ndmp.org – A great reference site to everything NDMP. This site provides a history of the protocol, the details of the various revisions, and where things are headed.

Biography

2009 EMC Proven Professional Knowledge Sharing 24