Making a Case for Super

making a case for Super supercomputer evokes images of “big iron” and speed; it is the Formula 1 racecar of computing. As we venture forth into the new millennium, however, I argue that efficiency, reliability, and availability will become the dominant issues by the end of this decade, not only for supercomputing, but also for computing in general. Over the past few decades, the supercomputing industry has focused on and continues to focus on performance in terms of speed and horsepower, as evidenced by the annual Gordon Bell Awards for performance at Supercomput- ing (SC). Such a view is akin to deciding to purchase an automobile based primarily on its top speed and horsepower. Although this nar- row view is useful in the context of achieving WU-CHUN FENG, “performance at any cost,” it is not necessar- ily the view that one should use to pur- LOS ALAMOS NATIONAL LABORATORY chase a vehicle. The frugal consumer might 54 October 2003 rants: [email protected] more queue: www.acmqueue.com October 2003 55 QUEUE QUEUE It’s time for the computing community to use alternative metrics for evaluating performance. making a case for computing 54 October 2003 rants: [email protected] more queue: www.acmqueue.com October 2003 55 QUEUE QUEUE making a case for Supercomputing of new machine rooms, as shown in fi gure 1, and in some cases, requires the construction of en- FIGCourtesy of Lawrence Livermore National1 Library. tirely new buildings. The primary reason for this consider fuel effi ciency, reliability, and acquisition cost. less effi cient use of space is the exponentially increasing Translation: Buy a Honda Civic, not a Formula 1 racecar. power requirements of compute nodes, a phenomenon The outdoor adventurer would likely consider off-road I refer to as “Moore’s law for power consumption” (see prowess (or off-road effi ciency). Translation: Buy a Ford fi gure 2)—that is, the power consumption of compute Explorer sport-utility vehicle, not a Formula 1 racecar. nodes doubles every 18 months. This is a corollary to Correspondingly, I believe that the supercomputing (or Moore’s law, which states that the number of transistors more generally, computing) community ought to have per square inch on a processor doubles every 18 months alternative metrics to evaluate supercomputers— [1]. When nodes consume and dissipate more power, they specifi cally metrics that relate to effi ciency, reliability, and must be spaced out and aggressively cooled. availability, such as the total cost of ownership (TCO), Without the exotic housing facilities in fi gure 1, tradi- performance/power ratio, performance/space ratio, failure tional (ineffi cient) supercomputers would be so unreliable rate, and uptime. (due to overheating) that they would never be available for use by the application scientist. In fact, unpublished empirical data from two leading vendors corroborates In 1991, a Cray C90 vector supercomputer occupied that the failure rate of a compute node doubles with about 600 square feet (sf) andMOTIVATION required 500 kilowatts (kW) of power. The ASCI Q supercomputer at Los Alamos National Laboratory will ultimately occupy more than 21,000 sf and require 3,000 kW. Although the performance between these two systems has increased by nearly a factor of 2,000, the performance per watt has increased only 300-fold, and the performance per square foot has increased by a paltry factor of 65. This latter number implies that supercomputers are making less effi cient use of the space that they occupy, which often results in the design and construction FIG 2 56 October 2003 rants: [email protected] more queue: www.acmqueue.com October 2003 57 QUEUE QUEUE every 10-degree C (18-degree F) increase in temperature, the following metrics: performance/power ratio, as per Arrenhius’ equation when applied to microelec- performance/space ratio (or compute density), failure tronics; and temperature is proportional to power rate, and uptime. consumption. We can then extend this argument to the more general computing community. For example, for e-businesses Green Destiny, as shown in figure 3, is the name of our such as Amazon.com that use multiple compute systems 240-processor supercomputer that fits in a telephone to process online orders, the cost of downtime resulting boothEFFICIENT and SUPERCOMPUTINGsips less than 5.2 kW of power at full load (and from the unreliability and unavailability of computer only 3.2 kW when running diskless and computationally systems can be astronomical, as shown in table 1— idle). It provides affordable, general-purpose computing millions of dollars per hour for brokerages and credit card to our application scien- companies and hundreds of thousands of dollars per hour tists while sitting in an 85- for online retailers and services. This downtime cost has to 90-degree F dusty ware- two components: lost revenue (e.g., the end user “clicking house at 7,400 feet above over” to the competitor’s Web site) and additional hours sea level. More impor- of labor spent fixing the computer systems. tantly, it provides reliable computing cycles without Table 1. Estimated Costs of an Hour of Server any special facilities—that Downtime for Business Services is, no air conditioning, no Service Cost of One Hour of Downtime humidification control, Brokerage Operations $6,450,000 no air filtration, and no Courtesy of Los Alamos National Laboratory. Courtesy of Los Credit Card Authorization $2,600,000 ventilation—and without 3 FIG eBay $225,000 any downtime. (In con- Amazon.com $180,000 trast, a more traditional, Package Shipping Services $150,000 high-end 240-processor Home Shopping Channel $113,000 supercomputer such as a Catalog Sales Center $90,000 Beowulf cluster [4] gener- Source: Contingency Planning Research. ally requires a specially cooled machine room to Clearly, downtime should be a component in the total operate reliably, as such cost of ownership (TCO) of a computer system, whether a supercomputer easily the system is a Web-server farm or a supercomputer. But consumes as much as 36.0 what other components make up TCO? More gener- kW of power and cooling, ally than even downtime, TCO consists of two parts: (1) roughly seven times more cost of acquisition and (2) cost of operation. The former than Green Destiny.) is a one-time cost that can be defined as all the costs Green Destiny takes a incurred in acquiring a computer system—for example, novel and revolutionary procurement, negotiation, and purchase—and, thus, approach to supercomput- is relatively straightforward to quantify [2]. The latter, ing, one that ultimately however, is a recurring cost that consists of multiple redefines performance to components, including costs related to system integration encompass metrics that and administration, power and cooling, downtime, and are of more relevance space. Although the costs related to power and cooling to end users: efficiency, and space are easily quantifiable, the other operational reliability, and availability. costs—that is, system integration and administration and As such, Green Destiny is downtime—tend to be highly institution-specific and arguably the world’s most full of hidden costs [3]. As a result, I conclude that TCO efficient supercomputer as cannot be easily quantified. I instead focus on quantifying it provides a completely metrics that are related to TCO such as efficiency, integrated solution that reliability, and availability. Specifically, we propose is orders of magnitude 56 October 2003 rants: [email protected] more queue: www.acmqueue.com October 2003 57 QUEUE QUEUE making a case for Supercomputing superior to any other solution based on effi ciency, reli- temperatures of the processors differ by 57.3 degrees C ability, availability, versatility, management, self-monitor- (or 135.1 degrees F) when running a software-based DVD ing and measurement, and ease of use [5,6]. player. This means that based on the corroborated Ar- renhius equation, the conventional, low-power, mobile processor (without any active cooling) is 32 times more To achieve such effi ciency, reliability, and availability, we likely to fail than the Transmeta processor (without any designed an architecture around which we could appro- active cooling). priatelyTHE MAGIC stitch BEHIND together GREEN the modifi DESTINY ed building blocks of Although the Transmeta processor is signifi cantly Green Destiny. These building blocks include a Trans- more reliable than a conventional mobile processor, its meta-powered RLX ServerBlade as the compute node and Achilles’ heel is its fl oating-point performance. Conse- World Wide Packets’ Lightning Edge network switches quently, we modifi ed the CMS to create a “high-perfor- confi gured in a one-level tree topology for effi cient com- mance CMS” that improves fl oating-point performance munication, as shown in fi gure 4. by nearly 50 percent and ultimately matches the By selecting a Transmeta processor as a compute performance of the conventional mobile processor on a engine, Green Destiny takes a predominantly hardware- clock-cycle-by-clock-cycle basis. based approach to power-aware supercomputing. A On the network side, Green Destiny runs a software Transmeta processor eliminates about 75 percent of the confi guration for the Lightning Edge switches where fea- transistors used in a traditional RISC architecture and tures such as auto-negotiation are simply turned off, since implements the lost (but ineffi cient) hardware function- all link speeds are known. This reduces power consump- ality in its code-morphing software (CMS), a software tion down to a few watts per port. layer that sits directly on the Transmeta hardware. This approach results in a processor that runs cooler than other processors, as illustrated by fi gure 5, which shows Initially, we turned to the theoretical astrophysics com- the thermal images of a conventional, low-power, mobile munity for a scientifi c application to run on Green processor and a Transmeta processor.

Making a Case for Super

Green Destiny: a 240-Node Compute Cluster in One Cubic Meter

Service Manual

Memorandum in Opposition to Hewlett-Packard Company's Motion to Quash Intel's Subpoena Duces Tecum

The Technology Behind Crusoe™ Processors

USCOURTS-Ca9-09-35307-1.Pdf

TRANSMETA BREAKS X86 LOW-POWER BARRIER VLIW Chips Use Hardware-Assisted X86 Emulation by Tom R

Vertigo: Automatic Performance-Setting for Linux

The First International Conference on Mobile Systems, Applications, and Services

Vol 1-No 2 0801.Qxd

Linux Information Sheet Linux Information Sheet

The Technology Behind Crusoe™ Processors: Low-Power X86-Compatible Processors Implemented with Code-Morphing™ Software”

Credit Suisse Securities V. Simmonds (Brief in Opposition).Pdf