viewpoints

Vdoi:10.1145/1859204.1859216 Mark Smotherman and Dag Spicer Historical Reflections IBM’s Single-Processor Efforts Insights on the pioneering IBM Stretch and ACS projects.

magine a CPU designed to is- ically collected and available online second machine cycle time and a two- sue and execute up to seven at the in microsecond memory cycle time. In instructions per clock cycle, Mountain View, CA. response to this logic/memory speed with a clock rate 10 times faster imbalance, Amdahl worked with John than the reigning supercom- Stretch Backus to design an instruction look- puter. Imagine a design team of top Many histories recount that Stretch ahead scheme they called asynchronous I 10 experts in , com- began in 1954 with efforts by Steve non-sequential (ANS) control. pilers, and computer engineering—in- Dunwell.2,11 Less known is that Gene Both Amdahl and Dunwell partici- cluding two future ACM A.M. Turing Amdahl, designer of the IBM 704 scien- pated in the sales pitch to Los Alamos Award recipients, five future IBM Fel- tific computer, was assigned to design in 1955. When Los Alamos contracted lows, and five future National Academy Stretch.10 Stretch was targeted at the with IBM in 1956 to build Stretch, Dun- of Engineering members. Imagine that needs of the Livermore and Los Alamos well was given control of the project and this team explored advanced computer nuclear weapons laboratories, such as Amdahl left the company. architecture ideas ranging from clus- calculations for hydrodynamics and Dunwell recruited several new gradu- tered microarchitecture to simultane- neutron diffusion. Unlike the vacuum- ates including Fred Brooks and John ous multithreading. tube-based 704, which had identical Cocke. Harwood Kolsky, a physicist at Is this a description of the latest mi- machine and memory cycle times of Los Alamos, joined the project in 1957. croprocessor or mainframe? No, this is 12 microseconds, the transistorized Cocke and Kolsky developed a simulator the ACS-1 supercomputer design from Stretch was expected to have a 100-nano- that guided design decisions, particular- more than 40 years ago. In the 1950s and 1960s IBM un- dertook three major supercomputer projects: Stretch (1956–1961), the Sys- tem/360 Model 90 series, and ACS (both 1961–1969). Each project produced sig- nificant advances in instruction-level parallelism, and each competed with from other manu- facturers: Univac LARC, Control Data Corporation (CDC) 6600, and CDC 6800/7600, respectively. Of the three projects, the Model 90 series (91, 95, and 195) was the most successful and remains the most well known today, in particular for its out-

of-order processing of floating-point IBM F O

Y

operations. But over the past few years, S RTE

new information about the other two U O efforts has surfaced, and many previ- C

ously unseen documents are now phys- The IBM Stretch supercomputer. IMAGE

28 communications of the acm | december 2010 | vol. 53 | no. 12 viewpoints

ly the details of the look-ahead. In its fi- nal form, branches in Stretch that could Calendar not be pre-executed were predicted, and Although only instructions on the predicted path were one of these three of Events speculatively executed—an astounding innovation for the late 1950s.a projects was a December 15–18 Stretch pioneered the use of tran- commercial success, SIGGRAPH Asia 2010, sistor technology within IBM, and the Seoul, Republic of Korea, each significantly Contact: Hyeongseok Ko, V circuitry and memory modules de- Email: [email protected] signed for Stretch were used to bring contributed to the the 7000-series of computers to mar- December 17–18 ket much earlier than would have been computer industry. First ACM Annual Symposium on Computing for Development, otherwise possible. Stretch pioneered London, United Kingdom, many of the ideas that would later de- Contact: Andy Dearden, fine the System/360 architecture includ- Email: [email protected] ing the 8-bit byte, a generalized inter- December 21–23 rupt system, memory protection for a X quickly evolved into the successful International Conference multiprogrammed operating system, Model 90 series of the System/360.6 Sev- on Frontiers of Information and standardized I/O interfaces.3,4 enteen Model 91s and 95s were built. Technology Islamabad, Pakistan, Stretch became the IBM 7030. A to- CDC sales and profits were impacted, Contact: Muhammad Sarfaz, tal of nine were built, including one as and CDC sued IBM in December 1968 Email: [email protected] part of a special-purpose computer de- alleging unfair competition. The U.S. veloped for NSA, codenamed Harvest. Department of Justice filed a similar January 4–8 The Third International However Stretch did not live up to its ini- antitrust suit against IBM one month Conference on Communication tial performance goal of 60 to 100 times later. CDC and IBM settled in 1973, and Systems and Networks, the performance of the 704, and IBM the government’s antitrust action was Bangalore, India, Chairman Tom Watson, Jr. announced dropped in 1982. Contact: David B. Johnson, Email: [email protected] a discount in proportion to this reduced performance and withdrew the 7030 Project Y January 5–8 from the market. Stretch was consid- Project Y was led by Jack Bertram at IBM Foundations of Genetic ered a commercial failure, and Dunwell Research beginning in 1963. Bertram Algorithms XI, Schwarzenberg, Austria, was sent into internal exile at IBM for recruited Cocke, along with Fran Allen, Sponsored: SIGEVO, not alerting management to the devel- , and Herb Schorr. Cocke Contact: Hans-Georg Beyer, oping performance problems. A num- proposed decoding multiple instruc- Email: [email protected] ber of years later, once Stretch’s contri- tions per cycle to achieve an average January 17–20 butions were more apparent, Dunwell execution rate of 1.5 instructions per cy- The Thirteenth Australiasian was recognized for his efforts and made cle.9 Cocke said this goal was a reaction Computing Education an IBM Fellow. to Amdahl, who after returning to IBM Conference, Perth QLD Australia, Despite the failure, Kolsky and oth- had postulated an upper limit of one in- Contact: Michael de Raadt, 8 ers urged management to continue struction decode per cycle. Email: [email protected] work on high-end processors. Although In 1965, responding to the announce- Watson’s enthusiasm was limited, he ment of the CDC 6800 (later to become January 22–26 Fifth International Conference approved two projects named “X” and the 7600) and to Seymour Cray’s suc- on Tangible, Embedded, and “Y” with goals of 10-to-20 and 100 times cess with a small isolated design team Embodied Interaction, faster than Stretch, respectively. for the 6600, Watson expanded Project Funchal, Portugal, Y and relocated it to California near the Contact: Mark D. Gross, Email: [email protected] Project X company’s San Jose disk facility. It was Facing competition from the announce- now called ACS (Advanced Computer January 23–29 ment of the CDC 6600 in 1963, Project Systems). The target customers were the The 38th Annual ACM SIGPLAN- same as for Stretch. SIGACT Symposium on Principles of Programming Languages, a The look-ahead supported recovery actions ACS-1 Austin, TX, to handle branch mispredictions and pro- Amdahl was made an IBM Fellow in Sponsored: SIGPLAN, vided for precise interrupts. Also, as in many 1965, and Bob Evans encouraged him Contact: Thomas J. Ball, current-day processors, complex instructions Email: [email protected] were broken into simpler parts (“elemental- to informally oversee ACS. Although ized” in Stretch terminology), with each part Amdahl argued strongly for System/360 assigned a different look-ahead entry. How- compatibility, Schorr and other archi- ever, this complexity was blamed for part of Stretch’s performance problems, and specula- tects designed the ACS-1 around more tive execution was not attempted in IBM main- robust floating-point formats of 48-bit frames until some 25 years later. single-precision and 96-bit double-

december 2010 | vol. 53 | no. 12 | communications of the acm 29 viewpoints

precision as well as a much larger reg- Amdahl stayed in California and be- ECL circuits and packaging for the Am- ister set than System/360.7 Amdahl was gan his own company, with two dozen dahl 470. Allen and Cocke disseminated subsequently excluded and thereafter former ACS engineers joining him to their work on program optimization, worked on his own competing design. build the Amdahl 470. Other ACS proj- and Cocke carried the idea of designing Average access time to memory ect members moved to IBM’s San Jose a computer in tandem with its compiler in ACS-1 was reduced by using cache disk drive facility or joined other com- into later projects, one of which was the memory.b A new instruction prefetch panies in the Valley. IBM 801 RISC. Conway went on to co- scheme was designed by Ed Susseng- Cocke went back to the East coast, author a seminal book on VLSI design uth. Pipeline disruption from branch- staying with IBM, and he later expressed and attributes much of her insight into es was minimized by using multiple regret that very little of the ACS-1 design design processes to her ACS experience. condition codes along with prepare-to- was ever published.5 Indeed, had the branch and predication schemes cre- ACS-1 been built, its seven-issue, out-of- Historical Collections ated by Cocke and Sussenguth. Com- order design would have been the pre- The Computer History Museum col- piler optimizations were viewed as eminent example of instruction-level lection contains more than 900 Stretch critical to achieving high performance, parallelism. Instead, the combination and ACS-related documents donated and Allen and Cocke made significant of multiple instruction issue and out-of- by Harwood Kolsky, and many of these contributions to program analysis and order issue would not be implemented are online at http://www.computerhis- optimization.1 until 20 years later. tory.org/collections/ibmstretch/. These As for Stretch, detailed simulation In discussing supercomputers in documents shed light on contributions was critical. Developed by Don Rozen- his autobiography, Watson blamed the to Stretch and ACS by Amdahl, Cocke, berg and Lynn Conway, the simulator “erratic” IBM efforts partly on his own and others, as well as recording the de- was used for documentation as well as “temper,” and said he had come to sign trade-offs made during the Stretch validation and improvement of the de- view IBM’s competition with CDC like project and the studies made of Stretch sign. While working through the simu- General Motors’ competition with Fer- performance problems. The Museum lation logic for multiple instruction rari for building a high-performance has several oral history interviews on- decode and issue, Conway invented a sports car.12 line at http://www.computerhistory.org/ general method to issue multiple out- In the 1970s the supercomputer in- collections/oralhistories/ and has made of-order instructions per machine cycle dustry pursued vector processing, with recent talks available on its YouTube that used an instruction queue to hold CDC building the unsuccessful STAR- channel at http://www.youtube.com/ pending instructions. Named “Dynam- 100 in 1974 and Cray Research building user/ComputerHistory/. The Museum ic Instruction Scheduling,”9 the scheme the very successful Cray-1 in 1976. IBM also features a new exhibit, part of which met the timing constraints and was announced a vector add-on for its main- focuses exclusively on supercomputers, quickly adopted. frames in 1985, but the extension did including the CDC 6600 and Cray-1. Also like Stretch, advanced circuit not sell well. IBM funded Supercomput- technology was key to the performance er Systems Incorporated between 1988 References 1. allen, F. The history of language processor technology goals. New circuits were developed with and 1993, but SSI went bankrupt before in IBM. IBM Journal of Research and Development 25, switching times in the nanosecond delivering its multiprocessor computer. 5 (Sept. 1981). 2. bashe, C. et al. IBM’s Early Computers. MIT Press, range, and new circuit packaging and More recently IBM built a number of Cambridge, MA, 1986. cooling approaches were investigated. successful large clusters, including Blue 3. brooks, F., Jr. Stretch-ing is great exercise—It gets you in shape to win. IEEE Annals of the History of In 1968, based on Amdahl’s cost/ Gene and Roadrunner. In the June 2010 Computing 32, 1 (Jan. 2010). performance arguments for his own 4. buchholz, W. Planning a Computer System: Project “Top500” Supercomputer list, four of Stretch. McGraw-Hill, Inc., Hightstown, NJ, 1962. design and the increasing cost pro- the top 10 supercomputers in the world 5. Cocke, J. The search for performance in scientific processors. Commun. ACM 31, 3 (Mar. 1988). jections for unique ACS-1 software, were built by IBM. 6. pugh, E., Johnson, L. and Palmer, J. IBM’s 360 and management decided to make ACS Early 370 Systems. MIT Press, Cambridge, MA, 1991. 7. Schorr, H. Design principles for a high-performance System/360 compatible and appointed Impact system. In Proceedings of the Symposium on Amdahl as director. The next year the Although only one of these three proj- Computers and Automata (New York, April 1971). 8. Shriver, B. and Capek, P. Just curious: An interview project was canceled when Amdahl ects was a commercial success, each sig- with . IEEE Computer 32, 11 (Nov. 1999). pushed for three different ACS-360 nificantly contributed to the computer 9. Smotherman, M. IBM Advanced Computing Systems (ACS); http://www.cs.clemson.edu/~mark/acs.html. models. Instead, the cache-enhanced industry. Fred Brooks has described 10. Smotherman, M. IBM Stretch (7030)—Aggressive Model 195 was announced, and ap- the impact Stretch had on IBM, and Uniprocessor Parallelism; http://www.cs.clemson. edu/~mark/stretch.html. proximately 20 were built. especially on the System/360 architec- 11. Spicer, D. It’s not easy being green (or “red”): The IBM ture.3 The Model 90 series influenced stretch project. Dr. Dobb’s Journal, July 21, 2001. 12. Watson, Jr., T. and Petre, P. Father Son & Co. Bantam, After ACS high-performance CPU design across NY, 1990, 384. Although the project fizzled, Silicon the industry for decades. The impact of Valley benefited from the collection of ACS was more indirect, through people Mark Smotherman ([email protected]) is an associate professor in the School of Computing at talent assembled for the ACS project. such as Fran Allen, Gene Amdahl, John Clemson University, SC. Cocke, Lynn Conway, and others. For ex- Dag Spicer ([email protected]) is Senior b Cache memory was a new concept at the time; ample, Amdahl was able to recruit ACS Curator at the Computer History Museum in Mountain View, CA. the IBM S/360 Model 85 in 1969 was IBM’s first engineers Bob Beall, Fred Buelow, and commercial computer system to use cache. John Zasio to further develop high-speed Copyright held by author.

30 communications of the acm | december 2010 | vol. 53 | no. 12