HPCC Poweredge 2650 Infobrief
Total Page:16
File Type:pdf, Size:1020Kb
INFOBrief Dell™ PowerEdge™ 2650 High Performance Computing Clusters
High Performance Computing Clusters Key Points (HPCC) offer a cost effective, scalable Second generation of Dell’s High Performance Computing Cluster solution for parallel computing system (HPCC) provides computational-intensive capacity leveraging the platforms designed latest technology available in the market. for demanding, HPCC is a cost effective method for delivering a parallel computing compute intensive applications system platform, targeted towards compute- and data-intensive applications. Through Dell HPCC, users can aggregate standards-based server and storage resources into powerful supercomputers to provide an inexpensive yet powerful solution.
Dell Product Group Page 1 Updated November 2003 High Performance Computing Clusters (HPCCs) are popular methods for solving these complex problems because of their low price points and excellent scalability. Dell helps provide investment protection by offering solutions based on industry standard building blocks that can be re-deployed as traditional application servers as users integrate newer technology into their network infrastructures. Dell delivers high-volume, standards-based solutions into scientific and compute-intensive environments that can benefit from economies- of-scale, and add systems as requirements change. Dell’s technology and methodology are designed to provide high reliability, price/performance leadership, easy scalability and simplicity by bundling order codes for hardware, software and support services for 8, 16, 32, and 64 node clusters.
Product Description Compute Nodes The concept of HPCC or “Beowulf” (the project name used by original designers) clusters originated at the Center of Excellence in Space Data and Information Sciences (CESDIS), located at the NASA Goddard Space Flight Center in Maryland. The project’s goal was to design a cost-effective, parallel computing cluster built from off-the-shelf components that would satisfy the computational requirements of the earth and space sciences community.
As cluster solutions have gained acceptance for solving complex computing problems, High Performance Computing Clusters (HPCC) are starting to replace supercomputers in this role. The cost of commodity HPCC systems has changed a purchase decision from evaluating expensive proprietary solutions, where cost was not the primary issue, to evaluating vendors based on their ability to deliver exceptional price-to- performance ratios and support capabilities. Master Node Logical View of a High Performance Computing Cluster
External The strategy behind parallel computing is to “divide and conquer.” By Storage dividing a complex problem into smaller component tasks that can be worked on simultaneously, the problem can often be solved more quickly. This can help save time and resources, as well as monetary costs. Dell’s HPCC uses a multi-computer architecture, as depicted in Figure 1. It features a parallel computing system that consists of one master node and multiple compute nodes connected via standard network interconnects. All of the server nodes in a typical HPCC run an industry standard
Dell Product Group Page 2 Updated November 2003 operating system, which typically offers substantial savings over proprietary operating systems.
Figure 1 Logical View of High Performance Computing Cluster
Master Node
Parallel Applications File Server/gateway Message Passing Library Linux Cluster Management Tools
Compute nodes
The master node of the cluster acts as a server for the Network File System (NFS), job-scheduling tasks, security, and acting as a gateway to end-users. The master node assigns each of the compute nodes with one or more tasks to perform as the larger task is broken into sub-functions. As a gateway, the master node allows users to gain access to the compute nodes.
The sole task of the compute nodes is to execute assigned tasks in parallel. A compute node does not have a keyboard, mouse, video card, or monitor. Access to client nodes is provided via remote connections through the master node.
From a user's perspective, a HPCC appears as a Massively Parallel Processor (MPP) system. Common methods of using the system are to access the master node either directly or through Telnet or remote login from personal workstations. Once logged onto the master node, users can prepare and compile their parallel applications and spawn jobs on a desired number of compute nodes in the cluster.
In addition to compute nodes and master nodes, key components of HPCC include: systems management utilities, applications, file systems, interconnects, and storage and software solution stacks.
Dell OpenManage Systems Management Because HPCC systems can consist of many nodes, it is important to be able to monitor and manage these nodes from a single console. It is
Dell Product Group Page 3 Updated November 2003 possible to have thousands of nodes within one cluster. To help manage such a sizable cluster, Dell OpenManage™ systems management utilities are designed to provide system discovery, event filtering, systems monitoring, proactive alerts, inventory and asset management as well as remote manageability for the compute nodes and master nodes. The Enterprise Remote Access is a separate management fabric that offers features that include: remote power operations; virtualization of FLOPPY, CDROM, and other peripherals; console redirect; and BIOS flash update
Applications Applications may be written to run in parallel on multiple systems and use the message-passing programming model. Jobs of a parallel application are spawned on compute nodes, which work collaboratively until the jobs are complete. During the execution, compute nodes use standard message-passing middleware to coordinate activities and information passing. Parallel Virtual File System A Parallel Virtual File System (PVFS) is used as a high-performance, large parallel file system for temporary storage and as an infrastructure for parallel I/O research. PVFS stores data on the existing local file systems of multiple cluster nodes, enabling many clients access to the data simultaneously. Within a HPC cluster, PVFS enables high-performance I/O that is comparable to that of other proprietary file systems. Interconnect To communicate with each other, the cluster nodes are connected through a network. The interconnect technology chosen depends on the amount of interaction between nodes when an application is executed. Some applications are similar to batch environments, and the communication between compute nodes is limited. For these environments, Fast Ethernet may be adequate. However, in environments that require more frequent communication, a Gigabit Ethernet interconnect is preferable. Some application environments can also benefit from a special interconnect that has been design to provide high-speed and low latency between the compute nodes. For these applications, Dell’s bundles are available with Myricom’s Myrinet. High Performance Computing Cluster Solution Stack Dell partners with service providers to deliver the software components necessary for implementing a HPCC solution. The HPCC
Dell Product Group Page 4 Updated November 2003 stack includes the job-scheduler, cluster management, message passing libraries, and compilers.
Dell Product Group Page 5 Updated November 2003 High Performance Computing Market
Target markets for high performance computing clusters are: higher education, large corporations, federal government, and technology sectors that require high performance computational computing. Industry examples include: oil and gas, aerospace, automotive, chemistry, national security, financial and pharmaceutical.
Typical high computation applications include: war and airline simulations, financial modeling, molecular modeling, fluid dynamics, circuit board design, ocean flow analysis, seismic data filtering, and visualizations.
Applications that use HPC clusters and their specific vertical markets can be found in Table 1.
Table 1 Vertical Markets Appropriate for HPCC
Vertical Description of Requirements Typical Applications Manufacturing Crash worthiness, stress analysis, Fluent, Radioss, shock and vibe, aerodynamics Nastran, Ansys, Powerflow Energy Seismic processing, geophysical VIP, Eclipse, Vertias modeling, reservoir modeling Life Sciences Drug design, bioinformatics, DNA Blast, Charmn, NAMD, mapping, disease research PC-Gamess, Gaussian Digital Media Render Farms Renderman, Discreet Finance Portfolio Management (Monte Barra, RMG, Sungard Carlo simulation), risk analysis
Although market opportunities exist in environments made up of thousands of nodes, standard HPCC configurations target the majority of clusters within the 8-node to 128-node configuration range. Customers investigating larger cluster configurations should contact the Dell Professional Services organization for assistance.
Dell’s bundled HPCC solutions target customers with varying levels of expertise, from complete turnkey solutions -- including hardware and software - to easy-to-order hardware-only bundles. For those who do require a complete solution, Dell also offers consulting assistance.
Dell Product Group Page 6 Updated November 2003 Features and Benefits
The Dell High Performance Computing Cluster leverages many advantages of Dell’s product line, including server, storage, peripheral, and services components. By creating standard product offerings, Dell solutions are designed to help minimize configuration complexity. These standard packages consist of 8, 16, 32, 64 and 128 node configurations. The key technology features of a Dell High Performance Computing cluster configuration are shown in Table 2. Table 2 The Key Technology Features of a Dell HPCC Configuration
Feature Function Benefit Full featured Pre-bundled order codes Simplified ordering hardware for 8 node, 16 node, 32 process and pre-qualified configurations node, 64 node and 128 configurations node configurations; 16 – 256 CPU configurations PowerEdge™ Dual Intel Xeon High performance 2650 (Compute processors at 2.0GHz, compute node for the Node) 2.4GHz, 2.8GHz, most challenging 3.06GHz, and 3.2GHz applications with 533MHz FSB High density enables providing highest large compute clusters performance2U form in a rack factor Helps to minimize I/O 3 slots on separate I/O bottlenecks buses Flexibility for 256MB of DDR SDRAM, increasing storage expandable to 12GB capacity on compute Configurable sized node drives (expandable to 5 drives) for internal storage o SCSI drives PowerEdge Dual Intel Xeon High performance 2650 (Master processors at 2.0GHz, and highly available Node) 2.4GHz, 2.8GHz, server in dense form 3.06GHz, and 3.2GHz factor with 533MHz FSB providing highest performance2U form factor 3 slots on separate I/O buses 256MB of DDR SDRAM, expandable to 12GB Configurable sized drives (expandable to 5
Dell Product Group Page 7 Updated November 2003 Feature Function Benefit drives) for internal storage o SCSI drives Interconnect – The interconnect The interconnect Options technologies in a HPCC technology is designed Fast Ethernet – configuration allow servers for message passing Low cost to communicate with each between the nodes. Gigabit other for node-to-node Offering Fast Ethernet Ethernet – High communications. and Gigabit Ethernet performance enables customers to Myrinet – High choose between a low Speed Low cost or higher Latency performance solution.
Myrinet provides a high speed low latency interconnect for application environments that require frequent node-to-node communication. Storage Devices PowerVault™ 220S SCSI Provides a cost effective external storage device on method for a large the Master Node for amount of external primary storage storage capabilities that can be allocated across multiple channels for maximized I/O performance Headless The ability to operate a Simplifies cable Operation system without keyboard, management and helps video or mouse (KVM) lower cost of solution by eliminating monitors, keyboards and mouse Redirection of ERA - Enhanced features Helps increase serial port for Out-of-Band and manageability of cluster Remote Management to from a single console allow centralized control of device. network devices through serial console ports. Operating Factory installation of Red Facilitates setup of cluster System Hat Linux operating system configuration software pre- install Wake on LAN Provides the capability to Remote management tool remotely power-on that can reduce system compute nodes over the management workload, Ethernet network provide flexibility to the system administrator's job, and help save time- consuming effort and
Dell Product Group Page 8 Updated November 2003 Feature Function Benefit costs. HPCC Software Cluster Manager Dell tested tools for Solution Stack – Compilers creating system Optional Job Scheduler environment for parallel deliverable MKL, BLAS, Atlas computing infrastructure through DPS MPI interface Server Embedded systems Detects and remedies Management management detects problems within the errors such as fan failures cluster. and temperature and voltage problems, which generate alerts and reports to Dell OpenManage Console
Dell Product Group Page 9 Updated November 2003 Key Customer Benefits
The performance of commodity computers and network hardware continually improves as new technology is introduced and implemented. At the same time, market conditions have led to decreases in the price of these components. As a result, it is now practical to build parallel computational systems based on low-cost, high-density servers, such as the Dell PowerEdge 2650, rather than buy CPU time on expensive supercomputers. Dell PowerEdge servers are tuned to take advantage of the existing server/OS/application combination. Dell PowerEdge server performance and price/performance are typically among the industry leaders on a variety of benchmark standard scales (TPC-C, TPC-W; SPECweb99).
Low cost and high performance are only two of the advantages of using a Dell High Performance Computing Cluster solution. Other key benefits of HPCC versus large Symmetric Multi Processors (SMP) are shown in Table 3.
Table 3 Comparison of SMP and HPCC Environments
Large SMPs HPCC Scalability Fixed Unbounded Availability High High Ease of Technology Refresh Low High Application Porting None Required Operating System Porting Difficult None Service and Support Expensive Affordable Standards vs. Proprietary Proprietary Standards Vendor Lock-in Required None System Manageability Custom; better Standard; usability moderate usability Application Availability High Moderate Reusability of Components Low High Disaster Recovery Ability Weak Strong Installation Non-standard Standard
Dell Product Group Page 10 Updated November 2003 The features compared in Table 3 are defined as follows: Scalability: The ability to grow in overall capacity and to meet higher usage demand as needed. When additional computational resources are needed, servers can be added to the cluster. Clusters can consist of thousands of servers. Availability: The access to compute resources. To help ensure high availability, it is necessary to remove any single point of failure in the hardware and software. This helps to ensure that any individual system component, the system as a whole, or the solution (i.e., multiple systems) stay continuously available. A HPCC solution offers high availability because the components can be isolated and, in many cases, the loss of a compute node in the cluster does not have a large impact on the overall cluster solution. The workload of that node is allocated among the remaining compute nodes. Ease of Technology Refresh: Integrating a new processor, memory, disk, or operating system technology can be accomplished with relative ease. In HPCC, as technology moves forward, modular pieces of the solution stack can be replaced as time, budget and needs require or permit. There is no need for a one-time 'switch-over' to the latest technology. In addition, new technology is often integrated more quickly into standards-based volume servers than proprietary system providers. Service and Support: Total cost of ownership – including post-sales costs of maintaining the hardware and software – from standard upgrades to unit replacement to staff training and education, is generally much lower when compared to proprietary implementations that typically come with a high level of technical services due to their inherently complex nature and sophistication. Vendor Lock-in: Proprietary solutions require a commitment to a particular vendor, whereas industry-standard implementations are interchangeable. Many proprietary solutions require only components that have been developed by that vendor. Depending on the revision and technology, application performance may be diminished. HPCC enables solutions to be built from the best-performing industry standard components. System Manageability: System management is the installation, configuration and monitoring of key elements of computer systems, such as hardware, operating system and applications. Most large SMPs have proprietary enabling technologies (custom hardware extension and software components) that can complicate the system management. On the other hand, it is easier to manage one large system compared to hundreds of nodes. However, with wide deployment of network infrastructure and enterprise management software, it is possible to easily manage multiple servers of a HPCC system from a single point.
Dell Product Group Page 11 Updated November 2003 Reusability of Components: Commodity components can be reused when off line, therefore preserving a customer’s investment. In the future, when refreshing a Dell HPCC PowerEdge solution with next generation platforms, the older Dell PowerEdge compute nodes can be deployed as File/Print servers, Web Servers or other infrastructure servers. Installation: Specialized equipment generally requires expert installation teams trained to handle such cases. They also require dedicated facilities such as power, cooling, etc. For HPCC, since the components are “off-the-shelf” commodities, installation is generic and widely supported.
Hardware Options The High Performance Computing Cluster configurations can be enhanced in the following ways: Increased memory in the compute nodes Increased internal HDD storage capacity in the compute nodes Increased external storage on the master node Additional NICs for the compute nodes and master node Through Dell Professional Services’ recommendations on faster interconnect technologies
Related Web Sites http://www.dell.com/clustering http://www.oscar.org/ http://www.csm.ornl.gov/oscar/ http://www.beowulf.org/ http://www.dell.com/us/en/esg/topics/segtopic_servers_pedge_rackmain.htm
Service and Support Dell HPCC systems come with the following: Three year limited warranty1 and three years of standard Next Business Day (NBD) parts replacement and one year of NBD on-site2 labor) 30-day “Getting Started” help line3 DirectLine network operating system support upgrades available with limited three-year warranty Telephone support 24 hours a day, 7 days a week, 365 days a year for the duration of the limited three-year warranty.
Dell Product Group Page 12 Updated November 2003 Dell Professional Services offers additional services to assist in: Solution Design Consultation Installation and Setup Pre-staging of solution at off-site location
1 For a copy of our Guarantees or Limited Warranties, please write Dell USA, L.P., Attn: Warranties, One Dell Way, Round Rock, TX 78682. For more information, visit www.dell.com/us/en/gen/services/service_service_plans.htm. 2 Service may be provided by third-part y. Technician will be dispatched if necessary following phone-based troubleshooting. Subject to parts availability, geographical restrictions and terms of service contract. Service timing dependent upon time of day call placed to Dell. U.S. only. 3 30-day telephone support program is at no additional charge to help customers with installation optimization and configuration questions during the critical 30-day period after shipment of PowerEdge systems. This program is available to customers who purchase Novell NetWare® or Microsoft Windows NT Server or Windows 2000 Server with their PowerEdge server from Dell. Support provided after the 30-day Getting Started Program will be for only the Dell hardware. Beyond 30 days from the invoice date, Dell’s DirectLine telephone support service is available for purchase for NOS support.
Dell, OpenManage, PowerVault and PowerEdge are trademarks of Dell Computer Corporation. Microsoft and Windows NT are registered trademarks of Microsoft Corporation. Intel is a registered trademark of Intel Corporation. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others.
©Copyright 2002 Dell Computer Corporation. All rights reserved. Reproduction in any manner whatsoever without the express written permission of Dell Computer Corporation is strictly forbidden. For more information contact Dell. Dell cannot be responsible for errors in typography or photography.
Dell Product Group Page 13 Updated November 2003