The Cydra 5 Departmental Supercomputer Design Philosophies, Decisions, and Trade-Offs
Total Page:16
File Type:pdf, Size:1020Kb
The Cydra 5 Departmental Supercomputer Design Philosophies, Decisions, and Trade-offs B. Ramakrishna Rau, David W.L. Yen, Wei Yen, and Ross A. Towle Cydrome, Inc. 1 groups or departments of scien- work done at TRW Array Processors and tists and engineers.’ It costs about the To meet at ESL (a subsidiary of TRW). The poly- same as a high-end superminicomputer cyclic architecture3 developed at ($500,000 to $1 million), but it can achieve price-performance TRW/ESL is a precursor to the directed- about one-third to one-half the perfor- targets for a new dataflow architecture developed at mance of a supercomputer costing $10 to Cydrome starting in 1984. The common $20 million. This results from using high- minisupercomputer, a theme linking both efforts is the desire to speed, air-cooled, emitter-coupled logic team of computer support the powerful and elegant dataflow technology in a product that includes model of computation with as simple a many architectural innovations. scientists conducted hardware platform as possible. The Cydra 5 is a heterogeneous multi- The driving force behind the develop- processor system. The two types of proces- an exhaustive-and ment of the Cydra 5 was the desire for sors are functionally specialized for the enlightening- increased performance over superminis on different components of the work load numerically intensive computations, but found in a departmental setting. The investigation into the with the following constraint: The user Cydra 5 numeric processor, based on the should not have to discard the software, company’s directed-dataflow architec- relative merits of the set of algorithms, the training, or the ture,* provides consistently high perfor- available techniques acquired over the years. As a mance on a broader class of numerical result, the user would be able to move up computations than do processors based on architectures. in performance from the supermini to the other architectures. It is aided by the high- minisuper in a transparent fashion. This bandwidth main memory system with its transparency is important for a product stride-insensitiveperformance. The inter- such as the Cydra 5, which is aimed at the active processors offload all nonnumeric with minimal involvement from the inter- growth phase of the minisupercomputer work from the numeric processor, leaving active or numeric processors. market. Such a product must cater to a it free to spend all its time on the numeri- broader and less forgiving user group than A bersion of this article appeared In froc. 22nd Huwuii cal application. Lastly, the 1/0 processors lnl’/ Con5 on Syslerns Sciences, Jan. 3-6. 1989, the pioneers and early adopters who pur- permit high-bandwidth 1/0 transactions Kailua-Kona, Hawaii. chased first-generation minisupercom- 12 00l8-9162/89/0l00-00I2$01 00 c 1989 IEEE COMPUTER puters. Ideally, a departmental super- number of things. It should be affordable 68020 we designed a fast interactive computer will display none of the idiosyn- to a small group or department of processor incorporating a 16-Kbyte, zero- crasies of typical supercomputers and engineers or scientists, which means an wait-state cache. A scheme developed by minisupercomputers and, in fact, will entry-level price under $500,000. It should two of the authors4 maintained cache project the “feel” of a conventional be easy to use by such a group and not coherency in this multiprocessor envi- minicomputer, except for its much higher require a “high priesthood” of skilled sys- ronment. performance on numerically intensive tems analysts catering to the idiosyncrasies tasks. of the machine. Lastly, it should be We could not afford to develop an oper- designed to handle all of the work load ating system from scratch, so we selected created by a department, not just the Unix, the only nonproprietary operating Cydra 5 system numerical tasks. As noted, this includes system available. Every workstation and arc hi tec ture tasks such as compiling, text editing, minisupercomputer vendor has had to executing the operating system kernel, and make the same choice for the same reason. networking-tasks for which a supercom- As a result, Unix has become the de facto From the outset we were determined not puter architecture is not cost effective. standard operating system for the to build an attached processor. The clum- engineering and scientific community. The siness of the host processor/attached These goals led to one of the key deci- more difficult choice was between the two processor approach leads to difficulty of sions regarding the Cydra 5: to have a competing flavors of Unix: Berkeley 4.2 use and loss of performance. The pro- numeric processor, highly optimized for and AT&T System V. Although Berkeley grammer must manage two computer sys- numerical computing, and a tightly inte- 4.2 was clearly dominant in 1984-85, we tems, each with its own operating system grated general-purpose subsystem that believed that with the addition of virtual and its own separate memory. The pro- would handle the nonnumeric work load. memory and networking, and with grammer must explicitly manage the In other words the Cydra 5 was to be a het- AT&T’s more aggressive support, System movement of programs and data back and erogeneous multiprocessor, with each V would pull ahead by the time Cydra 5 forth between the two systems. Since the processor functionally specialized for a was introduced. Accordingly, we took a data transfer path is slow relative to the different, but complementary, class of deep breath and jumped on the System V processing speed of both computers, it jobs. bandwagon. becomes a performance bottleneck. Pro- Initially, we planned to acquire a Although Cydrix, Cydrome’s imple- gram development tools for the attached general-purpose processor on an original- mentation of Unix, complies with the processor, such as compilers and linkers, equipment-manufacturer basis and inte- System V interface definition, it does con- run on the host. To avoid an unhealthy grate into it a numeric processor of our tain a number of extensions, primarily dependence on a single brand of host com- own design. We had determined that we for performance reasons. For use with a puter, this software must be maintained on needed about 10 million instructions per supercomputer, Unix is a rather low- multiple brands of host computer. second of computation capability from the performance, uniprocessor operating sys- A self-sufficient, stand-alone computer general-purpose processor to handle the tem. We rewrote the kernel significantly to has none of these problems. On the other 1/0 load imposed by the application run- symmetrically (not in a master-slave fash- hand, it assumes the burden of perform- ning on the numeric processor, as well as ion) distribute it over multiple interactive ing all of the general-purpose, nonnumeric the rest of the general-purpose work load. processors so that one or more processors work-such as networking, developing We soon discovered that a superminicom- could be simultaneously executing in ker- programs, and running the operating puter in this performance range would nel mode. This allowed us to bring the system-in addition to the numerically itself have a list price of about aggregate computing capability of multi- intensive jobs for which it was originally $500,000-the targeted price for the entire ple processors to bear on the task of sup- intended. As we will show later, the trade- Cydra 5! We found consistently that lower porting the 1/0 for the numeric processor offs made in designing a supercomputer priced general-purpose computation application. The file and 1/0 systems also and a general-purpose processor are often engines whose performance and price were received numerous enhancements. diametrically opposed. As a result, each closer to what we wanted had underdevel- ends up being the most cost effective for oped 1/0 capability by departmental As a consequence of this series of design a different class of jobs. Whereas a super- supercomputer standards. This situation decisions, the Cydra 5 Departmental computer may have 20 to 30 times the per- remains unchanged, if you examine the Supercomputer is two computers in one: formance of a general-purpose processor current crop of workstations and super- a numeric processor that is the functional on numerically intensive tasks, it may have workstations. The only workable scheme equivalent of other minisupercomputers, only three or four times the performance that met both our cost and performance and a general-purposemultiprocessor that on general-purpose tasks. When price is constraints was to design our own general- plays the role of a front-end system (Fig- considered as well, the supercomputer purpose subsystem consisting of multiple ure 1). However, these two subsystems are ends up having poorer cost-performance microprocessor-based processors. very tightly integrated. They share the than the general-purpose processor on same memory system and peripherals and Following a careful evaluation, we nonnumeric tasks, since its expensive are managed by the same operating sys- chose the as yet unannounced Motorola floating-point hardware is irrelevant. tem. Because of this, the Cydra 5 with 68020 microprocessor. The various RISC Cydrix presents the illusion of a simple We wanted the Cydra 5 to be not only (reduced instruction set computer) uniprocessor system to the user who does a stand-alone computer but also a depart- microprocessors were only on the drawing not wish to be bothered with what is inside mental supercomputer. By this we mean a boards at the time. Around the 16-MHz the black box. January 1989 13 I I I I Main memory Support memory I Service processor , I I h I I 32 - 512 Mbytes 8-64Mbytes I ; I I - Structural analysis Operating system kernel Fluid dynamics File WO Comput at ional chemistry Networking Seismic processing Compilation Image processing Text editing Figure 1.