Bibliography

[1] D. Abramson and G. Egan: "The RMIT Data Flow , a Hybrid Architecture" in Computer Journal, Vol. 33, No.3 (Special Issue, June 1990), pp/230-240. [2] E. Abreu et al.: "The APx Accelerator" in Proceedings of the 2nd Symposium on the Frontiers of Massively Parallel Computation, 1988, pp/413-417. [3] Z. Abuhamadeh: "The GAM II Pyramid" in Proceedings of the 2nd Symposium on the Frontiers of Massively Parallel Computation,1988, pp/443-448. [4] A. Agarwal et al.: "The MIT Alewife Machine: A Large-Scale Distributed-Memory Mul• tiprocessor" in Workshop on Scalable Shared-Memory Multiprocessors, Seattle, June 1990, published by Kluwer Academic Publishers. [5] H. Aiso, K. Sakamura and T. Ichikawa: "A Multi- Architecture for Asso• ciative Processing of Image Data" in M. Onoe, K. Preston, Jr. and A. Rosenfeld (Eds.): "Real Time/Parallel ," , Plenum Press, 1981, pp/203-217. [6] D. L. Allen et al.: "MUSE: A Wafer Scale Systolic DSP" in Proceedings of the Inter• national Conference on Wafer Scale Integration, 1990, pp/27-35. [7] G. S. Almasi: "Overview of Parallel Processing" in , Vol. 2 (1985), pp/191-203. [8] G. S. Almasi and A. Gottlieb: "Highly Parallel Computing," Benjamin/Cummings, 1989. [9] M. Amamiya et al.: "Implementation and Evaluation of a List Processing Oriented Dataflow Machine" in Proceedings of the 13th Annual International Symposium on Computer Architecture, 1986, pp/10-19. [10] H. Amano et al.: "(SM)2_II - A New Version of the Sparse Matrix Solving Machine" in 12th Annual International Symposium on Computer Architecture, 1983, pp/1OQ-107. [11] AMETEK Computer Research: "AMETEK Series 2010 Multicomputer" in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988, Vol. I, pp/836-839. [12] J. M. Anderson et al: "The Architecture of FAIM-1" in IEEE Computer, Vol. 20, No.1 (January 1987), pp/55-65. [13] J. P. Anderson et al.: "D825: A Multiple Computer System for Command and Con• trol" in AFIPS Conference Proceedings, Vo1.22, 1962 Fall Joint Computer Conference, Spartan Books, pp/86-96. Reprinted in [43], pp/447-455. [14] M. Annaratone et al.: "The Warp Computer: Architecture, Implementation and Per• formance" in IEEE Transactions on Computers, Vol. C-36, No. 12 (December 1987), pp/1523-1538. [15] M. Annaratone and R. Riihl: "Performance Measurements on a Commercial Multipro• cessor Running Parallel Code" in Proceeding of the 16th Annual International Sympo• sium on Computer Architecture, 1989, pp/307-314. [16] M. Annaratone et al.: "The K2 Parallel : Architecture and Hardware Imple• mentation" in Proceedings of the 17th Annual International Symposium on Computer Architecture, 1990, pp/92-103.

123 124 Bibliography

[17J M. Annaratone et al.: "Architecture, Implementation and System Software of K2" in A. Bode (Ed.): Distributed Memory Computing, Proceeding of the 2nd European Conference, EDMCC2, 1991, pp/473-484. [18J E. Appiani et al.: "EMMA 2 - An Industry Developed Hierarchical Multiprocessor for Very High Performance Signal Processing" in Proceeding's of the 1st International Conference on Supercomputing Systems, 1985, pp/310-319. [19J ARCHIPEL, "VOLVOX MACHINES OVERVIEW" Archipel, 24, boulevard de I'Hopital - 75005 Paris, France. [20J R. Arlauskas: "iPSC/2 System: A Second Generation Hypercube" in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988, Vol. I, pp/38-42. [21J C. V. W. Armstrong and E. T. Fathi: "A Fault-Tolerant Multimicroprocessor-Based Computer System for Space-Based Signal Processing" in IEEE Micro, Vol. 4, No. 6 (December 1984), pp/54-65. [22J Arvind and R. A. Iannucci: "A Critique of von Neumann Style" in 10th Annual International Symposium on Computer Architecture, 1983, pp/426-436. [23J K. Asanovic and J. R. Chapman: "Spoken Natural Language Understanding as a Paral• lel Application" in C. R. Jesshope and K. D. Reinartz (Eds.): CONPAR 88, Cambridge University Press, 1989, pp/508-515. [24J P. J. Ashenden, C. J. Barker and C. D. Marlin: "The Leopard Project" in Computer Architecture News, Vol. 15, No.4 (September 1987), pp/40-51. [25J P. M. Athanas and H. F. Silverman: "Armstrong II: A Loosely Coupled Multiprocessor with a Reconfigurable Communications Architecture" in Proceedings of the 5th Inter• national Parallel Processing Symposium, 1991, pp/385-388. [26J W. C. Athas and C. L. Seitz: "Multicomputers: Message Passing Concurrent Comput• ers" in IEEE Computer, Vol. 21, No.8 (August 1988), pp/9-24. [27J R. R. Atkinson and E. M. McCreight: "The Dragon Processor" in Computer Architec• ture News, Vol. 15, No.5 (October 1987), pp/65-69. [28J T. Baba, K. Ishikawa and K. Okuda: "A Two-Level Microprogrammed Multiprocessor Computer with Non-Numeric Functions" in IEEE Transactions on Computers, Vol. C- 31, No. 12 (December 1982), pp/1l42-1156. [29J R. G. Babb II (Ed.): "Programming Parallel Processors," Addison-Wesley, 1988. [30J J.-1. Baer: "Whither a Taxonomy of Computer Systems" in IEEE International Work• shop on Computer Systems Organization, 1983, pp/3-9. [31J A. F. Bakker et al.: "A Special Purpose Computer for Molecular Dynamics Computa• tion" in Journal of Computational Physics, Vol. 90, No.2 (October 1990), pp/313-335. [32J J. Ballam: "Future Plans For HEP Computing in the U.S." in L. O. Hertzberger and W. Hoogland (Eds.): Proceedings of the Conference on Computing in High Energy Physics, North-Holland 1986, pp/146-164. [33J J.-L. Basille et al.: "A Typical Propagation Algorithm on the Line Processor SY.MP.A.T.I.: The Regional Labeling" in K. Preston, Jr. and L. Uhr (Eds.): "Multi• computers and Image Processing, Algorithms and Programs," Academic Press, 1982, pp/99-110. [34J F. Baskett, T. Jermoluk and D. Solomon: "The 4D-MP Graphics Superworkstation: Computing + Graphics = 40MIPS + 40MFLOPS and 100,000 Lighted Polygons per Second" in Spring COMPCON '88, pp/468-471. [35J H. B. Baskin, B. R. Borgerson and R. Roberts: "PRIME: A Modular Architecture for Terminal Oriented Systems" in AFIPS Conference Proceedings, Vol. 40, 1972 Spring Joint Computer Conference, pp/431-437. [36J K. E. Batcher: "The Massively Parallel Processor System Overview" in J. L. Potter (Ed.): "The Massively Parallel Processor," MIT Press, 1985, pp/142-149. [37J A. Bauch, R. Braam and E. Maele: "DAMP - A Dynamic Reconfigurable Multipro• cessor System with a Distributed Switching Network" in A. Bode (ed.): Distributed Memory Computing, Proceeding of the 2nd European Conference, EDMCC2, 1991, pp/495-504. Bibliography 125

[38] A. Bauch et al.: "The Software-Monitor DELTA-T and Its Use for Performance Mea• surements of Some Farming Variants on the Multi-Mansputer System DAMP" in L. Bouge et al. (Eds.): Proceedings of Parallel Processing CONPAR 92 - VAPP V, Springer-Verlag LNCS 634, pp/67-78. [39] BBN Advanced Computers Inc.: "Parallel Computing: Past, Present and Future", BBN, November 1990. [40] M. S. Beckerman: "IBM 3090" in R. G. Babb II (Ed.) : "Programming Parallel Pro• cessors," Addison-Wesley, 1988, pp/93-103. [41] J. Beetem, M. Dennea and D. Weingarten: "The GFll Supercomputer" in Proceedings of the 12th Annual International Symposium on Computer Architecture, 1985, pp/108- 115. [42] G. Bell: "Ultracomputers: A Teraflop Before Its Time" in Communication of the ACM, Vol. 35, No.8 (August 1992), pp/27--47. [43] G. Bell and A. Newell: "Computer Structures: Readings and Examples," McGraw-Hill, 1971. [44] M. Beltrametti, K. Bobey and J. R. Zorbas: "The Control Mechanism of the Myrias Parallel Computer System" in Computer Architecture News, Vol. 16, No.4 (September 1988), pp/21-30. [45] R. Berger et al.: "The Lincoln Programmable Image Processing Wafer" in Proceedings of the International Conference on Wafer Scale Integration, 1990, pp/20-26. [46] V. P. Bhaktar: "Parallel Computing: An Indian Perspective" in H. Burkhart (Ed.): CONPAR '90 - VAPP IV, Springer-Verlag LNCS 457, pp/1O-25. [47] L. N. Bhuyan and D. P. Agrawal: "Design and Performance of Generalized Intercon• nection Networks" in IEEE Transaction on Computers, Vol. C-32, No. 12 (December 1983), pp/l081-1089. [48] W. Bibel et al.:"Parallel Inference Machines" in P. Treleaven and M. Vanneschi (Eds.): "Future Parallel Computers," Springer-Verlag LNCS 272, 1987, pp/185-226. [49] L. Bic and R. L. Hartmann: "AGM: A Dataflow Database Machine" in ACM Transac• tion on Database Systems, Vol. 14, No.1 (March 1989), pp/114-146. [50] L. Bic: "AGM: The Irvine Dataflow Database Machine" in V. M. Milutinovic (Ed.): "High Level Language Computer Architecture," Computer Science Press, 1989, pp/387- 412. [51] J. R. Biel et al.: "The ACP Cooperative Processes User's Manual" Fermilab Computing Division, DOcument #GA0006, NOv. 1990. [52] E. BinagJi et al.: "HCRC - Parallel Computer - A Massively Parallel Combined Ar• chitecture Supercomputer" in Microprocessing and Microprogramming, Vol. 25, No. 1-5 (January 1989), pp/287-292. [53] R. Bisiani: "The Harpy Machine: A Data Structure Oriented Architecture" in Papers of the 5th Workshop on Computer Architecture for Non-Numeric Processing, ACM Press, pp/128-136. [54] R. Bisiani and M. Ravishankar: "PLUS: A Distributed Shared-Memory System" in Proceedings of the 17th Annual International Symposium on Computer Architecture, 1990, pp/115-124. [55] R. Bisiani and M. Ravishankar: "Local Area Memory in PLUS" in M. Dubois and S. S. Thakker (Eds.): "Scalable Shared Memory Multiprocessors," Kluwer Academic Publishers, 1992, pp/301-311. [56] T. Blank: "The MasPar MP-l Architecture" in Proceedings of the IEEE COMPCON Spring '90, pp/20-24. [57] R. BlaSko: "Highly-Parallel Computation Model and Machine for Logic Programming" inDo J. Evans, G. R. Joubert and F. J. Peters (Eds.): Proceedings of Parallel Computing 89, North-Holland, 1990, pp/541-546. [58] D. W. Blevins et al.: "BLITZEN: A Highly Integrated Massively Parallel Machine" in Proceedings of the 2nd Symposium on the Frontiers of Massively Parallel Computation, pp/399-406. 126 Bibliography

[59J T. Bloch: "Two Recent Supercomputers, the CRAY-2 and SX-2" in L. O. Hertzberger and W. Hoogland (Eds.): Proceedings of the Conference on Computing in High Energy Physics, North-Holland, 1986, pp/96-106. [60J A. Bode et al.: "High Performance Multiprocessor System for Numerical Applications" in Proceedings of the 1st International Conference on Supercomputing Systems, 1985, pp/460-467. [61J A. Bode et al: "A Highly Parallel Architecture Based on a Distributed Shared Memory" in G. L. Reijns and M. H. Barton (Eds.): "Highly Parallel Computers," Elsevier Science Publishers (North-Holland), 1987, pp/19-28. [62J W. J. Bolosky, R. P. Fitzgerald and M. L. Scott: "Simple But Effective Techniques for NUMA Memory Management" in Operating Systems Review, Vol. 23, No.5 (December 1989), pp/19-31. [63J S. Borkav et al.: "iWARP: An Integrated Solution to High Speed Parallel Computing" in Proceedings of Supercomputing '88, pp/330-339. [64J L. Borrmann, M. Herdieckerhoff and A. Klein: "Tuple Space Integration into Modula-2, Implementation of the Linda Concept on a Hierarchical Multiprocessor" in C. R. Jesshope and K. D. Reinartz (Eds.): CONPAR 88, Cambridge University Press, 1989, pp/659-666. [65J W. J. Boukright: "The ILLIAC IV System" in Proceedings of the IEEE, April 1972, pp/369-388. [66J K. Boyanov and K. Yanev: "A Family of Highly Parallel Computers" in H. Burkhart (Ed.): CON PAR '90 - VAPP IV, Springer-Verlag LNCS 457, pp/569-580. [67J K. Bratbergsengen and T. Gjelsvik: "The Development of the CROSS 8 and the HCI6- 186 Parallel (Database) Computers" in H. Boral and P. Fandemay (Eds.): "Database Machines," Springer-Verlag LNCS 368, 1989, pp/359-372. [68J J. C. Browne: "Understanding Execution Behaviour of Software Systems" in IEEE Computer, Vol. 17, No.7 (July 1984), pp/83-87. [69J R. E. Buehrer et al.: "The ETH-Multiprocessor EMPRESS: A Dynamically Config• urable MIMD System" in IEEE Transactions on Computers, Vol. C-31, No. 11 (Novem• ber 1982), pp/1035-1044. [70J T. Burgaroff et al.: "The IBM Los Gatos Logic Simulation Machine Hardware" in Proceedings of the IEEE 1983 International Conference on Computer Design: VLSI in Computers, pp/584-587. [71J H. Burkhardt III: Announcement of the KSRI Supercomputer, February 22, 1992. [72J H. Burkart et al.: "The M3 Multiprocessor Programming Environment" in C. R. Jesshope and K. D. Reinartz (Eds.): CONPAR 88, Cambridge University Press, 1989, pp/446-455. [73J P. Burns et al.: "The JPL/Caltech Mark IIIfp Hypercube" in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988, Vol. I, pp/872-884. [74J V. Cantoni and S. Levialdi: "PAPIA: A Case History" in L. Uhr (Ed.): "Parallel Computer Vision," Academic Press, 1987, pp/3-13. [75J M. Chastain et al.: "The Convex C240 Architecture" in Proceedings of Supercomputing '88, pp/321-329. [76J T.-W. Chiu: "A Parallel Computer for Lattice Gauge Theories" in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988, Vol. I, pp/81-91. [77J N. H. Christ and A. E. Terrano: "A Very Fast Parallel Processor" in IEEE Transactions on Computers, Vol. C-33, No.4 (April 1984) pp/344-349. [78J D. R. Cheriton and R. A. Kutter: "Optimizing Memory-Based Messaging for Scalable Shared Memory Multiprocessor Architectures" Manuscript, Stanford Univeristy, 1993. [79J D. R. Cheriton et al.: Paradigm: a highly scalable shared memory multicomputer architecture in Computer, Vol. 24, No.2 (Feb. 1991) pp/33-46. [80J J. A. Clausing et al.: "A Technique for Achieving Portability among Multiprocessors: Implementation on the Lemur" in Parallel Computing, Vol. 2 (1985), pp/137-162. Bibliography 127

[81] C. L. Cloud: "The Geometric Arithmetic Parallel Processor" in Proceedings of the 2nd Symposium on the Frontiers of Massively Parallel Computation, pp/373-381. [82] G. Coghill and K. Hanna: "PLEIADES: A Multiprocessor Interactive Knowledge Base" in and Microsystems, Vol. 3, No.2 (March 1979), pp/77-82. [83] R. P. Colwell et al.: "Architecture and Implementation of a VLIW Supercomputer" in Proceedings of Supercomputing '90, pp/91O-919. [84] R. Comerford: "Engineering : Add-ons Add Versatility" in IEEE Spec• trum, April 1992, pp/46-5l. [85] A. Contessa et al.: "MaRS: A Combinator Graph Reduction Multiprocessor" in E. Odijk, M. Rem and J.-C. Syre (Eds.): "PARLE '89 - Parallel Architectures and Languages Europe," Springer-Verlag LNCS 366, 1989, Vol. I, pp/176-192. [86] "Coprocessor Array" in New Products, IEEE Computer, Vol. 24, No. 11 (November 1991), p/76. [87] A. Corradi and A. Natali: "Using the iAPX-432 as a Support for Chill Paral• lel Constructs" in Microprocessing and Microprogramming, Vol. 12, No. 3/4 (Octo• ber/November 1983), pp/159-165. [88] G. R. Couranz, M. S. Gerhardt and C. J. Young: "Programmable Radar Signal Pro• cessing Using the RAP" in Tse-yun Feng (Ed.): "Parallel Processing, Proceedings of the 1974 Sagamore Conference," Springer-Verlag LNCS 21, 1974, pp/37-52. [89] M. Cripps, T. Field and M. Reeve: "An Introduction to ALICE: A Multiprocessor Graph Reduction Machine" in S. Eisenbach (Ed.): "Functional Programming Languages, Tools and Architectures," Ellis Horwood, 1987, pp/111-128. [90] J. Croll: "VAX 6000 Model 400 System Overview" in Proceedings of the IEEE COM• PCON Spring '90, pp/110-114. [91] L. Curran: "Surprise - Apollo Reveals a 'Desktop' Supercomputer" in Electronics, Vol. 61, No.5 (March 3, 1988), pp/69-70. [92] E. L. Dagless, M. D. Edwards and J. T. Proudfoot: "Shared Memory in the CYBA-M Multi-microprocessor" in lEE Proceedings, Part E, Vol. 130, 1982, pp/116-124. [93] W. Dally: "The J-Machine System" in P. H. Winston with S. A. Shellard (Eds.): Ar• tificial Intelligence at MIT: Expanding Frontiers, Vol. 1, MIT Press, Chambridge, Ma. 1990, pp/548-581. [94] W. Dally et al.: "The Message Driven Processor: An Integrated Multicomputer Process• ing Element" in Proceedings of the 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors, pp/416-419. [95] V. David et al.: "Partitioning and Mapping Communication 4 Graphs on a Modular Reconfigurable Parallel Architecture" in L. Bouge et al. (Eds.): Proceedings of Parallel Processing CONPAR 92 - VAPP V, Springer-Verlag LNCS 634, pp/43-48. [96] E. S. Davidson: "A Multiple Stream Microprocessor Prototype System: AMP-I" in Proceedings of the 7th Annual Symposium on Computer Architecture, 1980, pp/9-16. [97] K. Deguchi, K. Tago and I. Morishita: "Integrated Parallel Image Processing on a Pipelined MIMD Multi-Processor System PSM" in Proceedings of the 10th International Conference on Pattern Recognition, 1990, pp/442-444. [98] L. Dekker: Personal Communication, 1993. [99] D. Del Corso et al.: "TOMP Project" in lEE Proceedings, Part E, Vol. 136, No.4 (July 1989), pp/225-233. [100] R. F. DeMara and D. I. Moldovan: "The SNAP-1 Parallel Artificial Intelligence Pro• totype" in Proceedings of the 18th Annual International Symposium on Computer Ar• chitecture, 1991, pp/2-1l. [101] R. F. DeMara and D. I. Moldovan: "Design of a Clustered Multiprocessor for Real Time Natural Language Understanding" in Proceedings of the 5th International Parallel Processing Symposium, 1991, pp/270-277. [102] J. B. Dennis: "The Variety of Data Flow Computers" in Proceedings of the 1st Interna• tional Conference on Distributed Computing Systems, 1979, pp/430-439; reprinted in R. H. Kuhn and D. A. Padua (Eds.): "Tutorial on Parallel Processing," IEEE Computer Society Press, 1981, pp/210-219. 128 Bibliography

[103] D. J. DeWitt: "DIRECT - A Multiprocessor Organization for Supporting Relational Database Management Systems" in IEEE Transactions on Computers, Vol. C-28, No.6 (June 1979), pp/395-406. [104] D. J. DeWitt and R. H. Gerber: "GAMMA, a High Performance Dataflow Database Machine" in Proceedings of the 1986 Very Large Databases Conference, pp/228-237. [105] D. J. DeWitt et al.: "A Single User Evaluation of the Gamma Database Machine" in M. Kitsuregawa and H. Tanaka (Eds.): "Database Machines and Knowledge Base Machines," Kluwer Academic Publishers, 1988, pp/370-386. [106] T. Diede et al.: "The Titan Graphics Supercomputer Architecture" in IEEE Computer, Vol. 21, No.9 (September 1988), pp/13-30. [107] D. C. Dinucci: "Alliant FX/8" in R. G. Babb II (Ed.): "Programming Parallel Proces• sors," Addison-Wesley, 1988, pp/27-42. [108] D. C. Dinucci: "Loral LDF-I00" in R. G. Babb II (Ed.): "Programming Parallel Pro• cessors," Addison Wesley, 1988, pp/125-141. [109] J. Dongarra, R. v.d. Geijn and D. Walker: "A Look at Scalable Dense Linear Algebra Libraries" in Proceedings of the Scalable High Performance Computing Conference (SHPCC-92), 1992, pp/372-379. [110] J. Edler et al.: "Issues Related to MIMD Shared Memory Computers: The NYU Ultra• computer Approach" in Proceedings of the 12th Annual International Symposium on Computer Architecture, 1985, pp/126--135. [111] K. A. Elmquist: "Architectural and Design Perspectives in a Modular Multi• microprocessor, the DPS-l" in AFIPS Conference Proceedings, Vol. 48, 1979 National Computer Conference, pp/587-593. [112] P. H. Enslow Jr.: "Multiprocessor Organization - A Survey" in Computing Surveys, Vol. 9, No.1 (March 1977), pp/l03-129. [113] P. H. Enslow Jr. (Ed.): "Multiprocessors and Parallel Processing," Wiley, 1974. [114] H. Essafi, M. Pic and D. Juvin: "I\;-Project / First Step: To Improve Data Manipulations and Representations on Parallel Computers" in L. Bouge et al. (Eds.): Proceedings of Parallel Processing CONPAR 92 - VAPP V, Springer-Verlag LNCS 634, pp/503-508. [115] S. E. Fahlman: "Massively Parallel Architectures for Artificial Intelligence: NETL, THISTLE, and Boltzmann Machines" in Proceedings of the National Conference of Artificial Intelligence, 1983, pp/109-113. [116] S. E. Fahlman and G. E. Hinton: "Connectionist Architectures for Artificial Intelli• gence" in IEEE Computer, Vol. 20, No.1 (January 1987), pp/100-109. [117] H. Falk: "Computing Speeds Soar with Parallel Processing" in Computer Design, Vol. 27, No. 12 (June 15, 1989), pp/49-58. [118] F. M. Farmwald: "The S-1 Mark IIA Supercomputer" in J. S. Kowalik (Ed.): "High Speed Computation," Springer-Verlag, 1984, pp/145-155. [119] R. A. Fatoohi: "Vector Performance Analysis of the NEC SX-2" in Proceedings of the 1990 International Conference on Supercomputing, pp/389-400. [120] R. J. Fazzari and J. D. Lynch: "The Second Generation FPS T Series: An Enhanced Parallel Vector Supercomputer" in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988, Vol. I, pp/61-70. [121] D. G. Feitelson et al.: "Issues in Run-Time Support for Tightly Coupled Parallel Pro• cessing" in Proceeding of SEDMS III, Symposium on Experiences with Distributed and Multiprocessor Systems, 1992, pp/27-42. [122] E. Fernandez et al.: "MPH - A Hybrid Parallel Machine" in Microprocessing and Microprogramming, Vol. 25, No. 1-5 (January 1989), pp/229-232. [123] S. Fernbach: "Parallelism in Computing" in Proceedings of the 1981 International Con• ference on Parallel Processing, pp/I-4. [124] C. Fernstrom, I. Kruzela and B. Svensson: "LUCAS, Associative Array Processor: De• sign, Programming and Application Studies," Springer-Verlag, LNCS 216, 1986. [125] C. A. Finnila and H. H. Love: "The Associative Linear Array Processor" in IEEE Transactions on Computers, Vol. C-26, No.2 (February 1977), pp/112-125. Bibliography 129

[126] J. A. Fisher: "Very Long Instruction Word Architectures and the ELI-512" in Pro• ceedings of the 10th Annual International Symposium on Computer Architecture, 1983, pp/140-150. [127] M. Flagg: "Dataflow Principles Applied to Real-Time Multiprocessing" in Proceedings of the IEEE COMPCON Spring '89, pp/84-89. [128] P. M. Flanders et al.: "Experience Gained in Programming the Pilot DAP, a Parallel Processor With 1024 Processing Elements" in M. Feilmeier (Ed.): "Parallel Computers - Parallel Mathematics," North-Holland, 1977, pp/269-274. [129] M. J. Flynn: "Some Computer Organizations and Their Effectiveness," in IEEE Trans• actions on Computers, Vol. C-21, No.9 (September 1972), pp/948-960. [130] T. Fossum and D. B. File: "Designing a VAX for High Performance" in Proceedings of the IEEE COMPCON Spring '90, pp/36-43. [131] D. E. Foulser and R. Schreiber: "The Saxpy Matrix-I: A General Purpose Systolic Computer" in IEEE Computer, Vol. 20, No.7 (July 1987), pp/35-43. [132] T. J. Fountain: "CLIP 4: a Progress Report" in M. J. B. Duff and S. Levialdi (Eds.): "Languages and Architectures for Image Processing," Academic Press, 1981, pp/283- 291. [133] T. J. Fountain: "Plans for the CLIP7 Chip" inS. Levialdi (Ed.): "Integrated Technology for Image Processing," Academic Press, 1985, pp/199-214. [134] G. C. Fox: "The Hypercube and the Caltech Concurrent Computation Program: A Microcosm of Parallel Computing" in B. J. Alder (Ed.): "Special Purpose Computers," Academic Press, 1988, pp/1-39. [135] : "Floating Point Systems T-Series Hypercube" in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988, Vol. I, pp/840-842. [136] G. Franceschetti et al.: "An Efficient SAR Parallel Processor" in IEEE Transactions on Aerospace and Electronic Systems, Vol. 27, No.2 (March 1991), pp/343-353. [137] H. Fromm et al.: "Experiences with Performance Measurement and Modeling of a Processor Array" in IEEE Transactions on Computers, Vol. C-32, No.1 (January 1983), pp/15-30. [138] H. Fuchs et al.: "Pixel Planes 5: A Heterogeneous Multiprocessor Graphics System Us• ing Processor Enhanced Memories" in ACM SIGGRAPH Computer Graphics, Vol. 233, No.3 (July 1989), pp/79-88. [139] T. Fukazawa et al.: "R-256: A Research Parallel Processor for Scientific Computation" in Proceedings of the 16th International Symposium on Computer Architecture, 1989, pp/344-35I. [140] A. Fukuda, K. Murakami and S. Tomita: "Towards Advanced Parallel Processing: Ex• ploiting Parallelism at Task and Instruction Level" in IEEE Micro, August 1991, pp/16- 19, 5G-6L [141] M. Furuichi, K. Taki and N. Ichiyoshi: "A Multi-Level Load Balancing Scheme for OR-Parallelism exhaustive Search Program on the Multi-PSI" in SIGPLAN Notices, Vol. 25, No.3 (Proceedings of the 2nd ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, March 1990), pp/5G-59. [142] I. Gaines et al.: "The ACP Multiprocessor System at Fermilab" in Computer Physics COmmunications, Vol. 45 (1987) pp/323-329. [143] D. Gajski et al.: "Cedar: A Large Scale Multiprocessor" in Proceedings of the 1983 International Conference on Parallel Processing, pp/524-529. [144] D. Gajski and J.-K. Peir: "Comparison of Five Multiprocessor Systems" in Parallel Computing, Vol. D 2 (1985), pp/265-282. [145] E. F. Gehringer, J. Abullarade and M. H. Gulyn: "A Survey of Commercial Parallel Processors" in Computer Architecture News, Vol. 16, No.4 (September 1988), pp/75- 107. [146] E. F. Gehringer, D. P. Siewiorek and Z. Segall: "Parallel Processing - The Cm* Ex• perience," Digital Press/DEC, 1986. 130 Bibliography

[147] P. Gemmar, H. Ischen and K. Luetjen: "FLIP: A Multiprocessor System for Image Analysis" in M. J. B. Duff and S. Levialdi (Eds.): "Languages and Architectures for Image Processing," Academic Press, 1981, pp/245-256. [148] P. Gemmar: "Image Correlation: Processing Requirements and Implementation Struc• tures on a Flexible Image Processing System (FLIP)" in K. Preston, Jr. and L. Uhr (Eds.): "Multicomputers and Image Processing, Algorithms and Programs," Academic Press, 1982, pp/87-98. [149] W. K. Giloi: "Interconnection Networks for Massively Parallel Computer Systems" in P. Treleaven and M. Vanneschi (Eds.): "Future Parallel Computers," Springer-Verlag LNCS 272, 1987, pp/321-348. [150] E. Gliick-Hiltrop, M. Ramlow and U. Shiirfeld: "The Stollman Dataflow Machine" in E. Odijk, M. Rem and J.-C. Syre (Eds.): "PARLE '89 - Parallel Architectures and Languages Europe," Springer-Verlag LNCS 366, 1989, Vol. I, pp/433-457. [151] M. Gokhale et al.: "Building and Using a Highly Parallel Programmable Logic Array" in IEEE Computer, Vol. 24, No.1 (January 1991), pp/81-89. [152] R. Goldberg et al.: "Progress on the Prototype PIPE" in Workshop on Computer Architecture for Pattern Analysis and Machine Intelligence, 1987, pp/67-74. [153] R. Gonzales-Rubio, A. Bradier and J. Rohmer: "DDC Delta Driven Computer - A Parallel Machine for Symbolic Processing" in P. Treleaven and M. Vanneschi (Eds.): "Future Parallel Computers," Springer-Verlag LNCS 272, 1987, pp/286-298. [154] A. Goto and S.-i. Uchida: "Towards a High Performance Parallel Inference Machine• The Intermediate Stage Plan of PIM" in P. Treleaven and M. Vanneschi (Eds.): "Future Parallel Computers," Springer-Verlag LNCS 272, 1987, pp/299-320. [155] T. Gotoh, S. Sasaki and M. Yoshida: "Two Image Processing Systems Challenging the Limits of Local Parallel Architecture" in Proceeding of the 1985 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, IEEE Computer Society Press, pp/272-279. [156] A. Gottlieb: "Architectures for Parallel Supercomputing," draft, 1992. [157] A. Gottlieb et al.: "The NYU Ultracomputer - Designing a MIMD, Shared Memory Parallel Machine" in Proceedings of the 9th Annual Symposium on Computer Archi• tecture, 1982, pp/27-42. [158] A. Gunzinger, S. Mathis and W. Guggenbiihl: "The SYnchronous DAtaflow MAchine: Architecture and Performance" in E. Odijk, M. Rem and J.-C. Syre (Eds.): "PARLE '89 - Parallel Architectures and Languages Europe," Springer-Verlag LNCS 366, 1989, Vol. I, pp/85-99. [159] J. R. Gurd and C. C. Kirkham: "The Manchester Prototype Dataflow Computer" in Communications of the ACM, Vol. 28, No.1 (January 1985), pp/34-52. [160] J. L. Gustafson, S. Hawkinson and K. Scott: "The Architecture of a Homogeneous Vector Supercomputer" in Journal of Parallel and Distributed Computing, Vol. 3, No.3 (1986), pp/297-304. [161] A. Guzman: "A Heterarchical Multi-Microprocessor LISP Machine" in Proceedings of the Workshop on Computer Architecture for Pattern Analysis and Image Database Management, IEEE Computer Science Press, 1981, pp/309-317. [162] R. H. Halstead: "Overview of Concert MultiLisp - A Multiprocessor Symbolic Com• puting System" in Computer Architecture News, Vol. 15, No.1 (March 1987), pp/5-14. [163] R. H. Halstead et al. : "Concert: Design of a Multiprocessor Development System" in Proceedings of the 13th Annual International Symposium on Computer Architecture, 1986, pp/4o-48. [164] D. Hammerstrom: "A VLSI Architecture for High-Performance, Low Cost On-Chip Learning" in Proceedings of the International Joint Conference on Neural Networks, 1990, Vol. 2, pp/537-543. [165] W. Handler: "The Impact of Classification Schema on Computer Architecture" in Pro• ceedings of the 1977 International Conference on Parallel Processing, pp/7-15. [166] A. Hang and R. Graybill: "The Martin Marietta Advanced Systolic Array Processor" in Proceedings of the 2nd Symposium on the Frontiers of Massively Parallel Computation, pp/367-372. Bibliography 131

[167] J. G. Harp: "ESPRIT Project PlO85 - Reconfigurable Transputer Project" in Proceed• ings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988, Vol. I, pp/122-127. [168] R. E Harper and J. H. Lala: "Fault-Tolerant Parallel Processor" in Journal of Guidance, Control and Dynamics, Vol. 14, No.3 (May-June 1991), pp/554-563. [169] P. Harrison and M. Reeve: "The Parallel Graph Reduction Machine ALICE" in J. H. Fasel and R. M. Keller (Eds.): "Graph Reduction," Proceedings of a Workshop, Springer-Verlag LNCS 279, 1986, pp/181-202. [170] B. D. Harry et al.: "A Fault-Tolerant Communication System for the B-HIVE General• ized Hypercube Multiprocessor" in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988, Vol. I, pp/355-362. [171] J. P. Hayes et al.: "A Microprocessor-Based Hypercube Supercomputer" in IEEE Micro, Vol. 6, No.5 (October 1986), pp/6-17. [172] F. E. Heart et al.: "A New Minicomputer/Multiprocessor for the ARPA Network" in AFIPS Conference Proceedings, Vol. 42, 1973 National Computer Conference and Exhibition, pp/529-537. [173] J. Herath, 1. Yuba and N. Saito: "Dataflow Computing" in A. Albrecht, H. Jung and K. Mehlhorn (Eds.): "Parallel Algorithms and Architectures," Springer-Verlag LNCS 269, 1987, pp/25-36. [174] L. O. Hertzberger: "New Architectures" in L. O. Hertzberger and W. Hoogland (Eds.): Proceedings of the Conference on Computing in High Energy Physics, North-Holland 1986; pp/17-33. [175] A. J. G. Hey: "Supercomputing with Transputers - Past, Present and Future" in Proceedings of the 1990 International Conference on Supercomputing (Computer Ar• chitecture News, Vol. 18, No.3), pp/479-489. [176] T. Higuchi et al.: "The Prototype Semantic Network Machine IXM" in Proceedings of the 1989 International Conference on Parallel Processing, pp/(I-)217-224. [177] T. Higuchi et al.: "IXM2: A Parallel Associative Processor" in Proceedings of the 18th Annual International Symposium on Computer Architecture, 1991, pp/22-31. [178] M. D. Hill et al.: "Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors" in ACM Transactions on Computer Systems, August 1993. [179] W. D. Hillis: "The Connection Machine," MIT Press, 1985. [180] K. Hiraki, T. Shimada and K. Nishida: "A Hardware Design of the SIGMA-I, a Data Flow Computer for Scientific Computation" in Proceedings of the 1984 International Conference on Parallel Processing, pp/524-531. [181] C. A. R. Hoare: "Communicating Sequential Processes" in Communications of the ACM, Vol. 21, No.8 (August 1978), pp/666-677. [182] R. W. Hockney: "MIMD in the USA - 1984" in Parallel Computing, Vol. 2 (1985), pp/119-136. [183] R. W. Hockney and C. R. Jesshope: "Parallel Computers," Adam Hilger, 1981. [184] D. Y. Hollinden, D. A. Hasgen and P. A. Wilsey: "Experiences Implementing the Mintabs System on a MasPar MP-1" in Proceeding of SEDMS III, Symposium on Experiences with Distributed and Multiprocessor Systems, 1992, pp/43-58. [185] "Honeywell Information Systems, 6180 Multics and 6000 Series" in P. H. Enslow Jr. (Ed.): "Multiprocessors and Parallel Processing," Wiley, 1974, pp/219-228. [186] P. Hoogvorst et al.: "POMP or How to Design a Massively Parallel Machine with Small Developments" in E. H. L. Aarts, J. v. Leewen and M. Rem (Eds.): PARLE '91, Springer-Verlag LNCS 505, pp/(I-)83-100. [187] T. Horie et al.: "APlOOO Architecture and Performance of LU-Decomposition" in Pro• ceedings of the 20th International Conference on Parallel Processors, 1991, pp/(I-)634- 635. [188] T. Hoshino and T. Shirakawa: "Mesh Connected Parallel Computer PAX for Scientific Applications" in Parallel Computing, Vol. 5 (1987), pp/363-371. [189] J. K. Howard, R. L. Maim and L. M. Warren: "Introduction to the IBM Los Gatos Logic Simulation Machine" in Proceedings of the IEEE 1983 International Conference on Computer Design: VLSI in Computers, pp/580-583. 132 Bibliography

[190] B. R. Huff: "The CYBERPLUS Parallel Processing System - A Supercomputer Alter• native" in L. O. Hertzberger and W. Hoogland (Eds.): Proceedings of the Conference on Computing in High Energy Physics, North-Holland, 1986, pp/41O--415. [191] J. C. Hughes: "ParSiFal- The Parallel Simulation Facility" in 0. Jesshope (Ed.): "Ma• jor Advances in Parallel Processing," The Technical Press - Unicorn, 1986, pp/146--154. [192] "Hughes Aircraft Company H4400 Computer System" in P. H. Enslow, Jr. (Ed.): "Mul• tiprocessors and Parallel Processing," Wiley, 1974, pp/229-237. [193] R. Hughey and D. P. Leprosti: "B-SYS: A 470-Processor Programmable Systolic Ar• ray" in Proceedings of the 20th International Conference on Parallel Processors, 1991, pp/(I-)580-583. [194] K. Hwang: "Exploiting Parallelism in Multiprocessors and Multicomputers" in K. Hwang and D. DeGroot (Eds.): "Parallel Processing for Supercomputers & Arti• ficial Intelligence," McGraw-Hill, 1989, pp/31-68. [195] K. Hwang and F. A. Briggs: "Computer Architecture and Parallel Processing," McGraw-Hill, 1985. [196] K. Hwang, R. Chowkwanyan and J. Ghosh: "Parallel Architectures for Implementing Artificial Intelligence Systems" in K. Hwang and D. DeGroot (Eds.): "Parallel Process• ing for Supercomputers & Artificial Intelligence," McGraw-Hill, 1989, pp/245-288. [197] R. N. Ibbett, P. C. Capon and N. P. Topham: "MU6V: A Parallel Vector Processing System" in 12th Annual International Symposium on Computer Architecture, 1983, pp/136-144. [198] "IBM Scalable POWERparallel System Reference Guide" IBM Publication Number G325-0648-00, 1993. [199] T. Ichikawa, K. Sakamura and H. Aiso: "A Multi-Microprocessor ARES with Associa• tive Processing Capability on Semantic Data Bases" in AFIPS Conference Proceedings, Vol. 47, 1978 National Computer Conference, pp/l033-1039. [200] K. Inagaki, T. Kato and T. Sakai: "MACSYM: An Event-Driven Parallel Processor for Document Pattern Understanding" in Proceedings of the 6th International Conference on Pattern Recognition, 1982, pp/258-261. [201] A. Inoue and A. Amada: "The Architecture of a Multi-Vector Processor System VPP" in Proceedings of the International Conference on Vector and Parallel Processors in Computational Science III, 1988, in Parallel Computing, Vol. 8, Nos. 1-3 (October 1988), pp/185-193. [202] Scientific Computers: "The Intel iPSC/2 System: The Concurrent Supercomputer for Production Applications" in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988, Vol. I, pp/843-846. [203] M. Ishii et al.: "Cellular Array Processor CAP and Applications" in Proceeding of the International Conference on Systolic Arrays, IEEE Computer Science Press, 1988, pp/535-544. [204] M. Iwashita and T. Temma: "Experiments on a Dataflow Machine" in G. L. Reijns and M. H. Barton (Eds.): "Highly Parallel Computers," Elsevier Science Publishers (North-Holland), 1987, pp/235-245. [205] I. P. Jalowiecki and S. J. Hedge: "The WASP Demonstrator Programme: The Engi• neering of a Wafer-Scale System" in 1990 Proceedings of the International Conference on Wafer Scale Integration, pp/43--49. [206] I. P. Jalowiecki, K. D. Warren and R. M. Lea: "WASP: A WSI Associative String Pro• cessor" in 1989 Proceedings of the International Conference on Wafer Scale Integration, pp/83-94. [207] J. W. Jang and W. Przytula: "Trade-Offs in Mapping FFT Computations onto Fixed Size Mesh Processor Array" in Proceedings of the 5th International Parallel Processing Symposium, 1991, pp/170-177. [208] C. Jesshope (Ed.): "Major Advances in Parallel Processing," The Technical Press - Unicorn, 1986, Appendix II - Product Summaries. [209] D. Johnson et al.: "Automatic Partitioning of Programs in Multiprocessor Systems" in Proceedings of the IEEE COMPCON 1980, pp/175-178. Bibliography 133

[210] E. E. Johnson: "Completing an MIMD Multiprocessor Taxonomy" in Computer Archi• tecture News, Vol. 16, No.3 (June 1988); pp/44-47. [211] W. K. Johnson: "Massively Parallel Computing System for Research and Development Applications" in Proceedings of the 2nd Symposium on the Frontiers of Massively Par• allel Computation, pp/407-411. [212] C. Kamath and S. Weeratuuga: "Implementation of Two Projection Methods on a Shared Memory Multiprocessor: DEC VAX 6240" in Parallel Computing Vol. 16, No. 2/3 (1990) pp/375-380. [213] A. Kapauan et al.: "The Pringle Parallel Computer" in 11th Annual International Symposium on Computer Architecture, 1984, pp/12-20. [214] 1. Kaplan: "The LDF-100: A Large Grain Dataflow Parallel Processor" in Computer Architecture News, Vol. 15, No.3 (June 1987), pp/5-12. [215] W. J. Karplus: "Vector Processors and Multiprocessors" in K. Hwang and D. DeGroot (Eds.): "Parallel Processing for Supercomputers & Artificial Intelligence," McGraw• Hill, 1989, pp/3-30. [216] H. Kasahara et al.: "Parallel Processing for the Solution of Sparse Linear Equations on OSCAR (Qptimally Scheduled Advanced Multiprocessor)" in C. R. Jesshope and K. D. Reinartz (Eds.): CONPAR 88, Cambridge University Press, 1989, pp/144-151. [217] H. Kasahara , H. Honda and S. Narita: "Parallel Processing of Near Fine Grain Us• ing Static Scheduling on OSCAR (Qptimally Scheduled Advanced Multiprocessor)" in Proceedings of Supercomputing '90, pp/856-864. [218] R. H. Katz and D. A. Patterson: "A VLSI RISC Multiprocessor Workstation" in Pro• ceedings of the IEEE 1986 International Conference on Computer Design, pp/94-96. [219] J. A. Katzman: "The Tandem 16: A Fault Tolerant Computing System" in D. P. Siewiorek, C. G. Bell and A. Newell: "Computer Structures: Principles and Examples," McGraw-Hill Computer Science Series, 1985 (original article from 1975), pp/470-485. [220] E. W. Kent, M. O. Shneier and R. Lumia: "PIPE: (Pipelined Image Processing Engine)" in Journal of Parallel and Distributed Processing, Vol. 2, No.1 (February 1985), pp/50- 78. [221] E. W. Kent and S. L. Tanimoto: "Hierarchical Cellular Logic and the PIPE Processor: Structural and Functional Correspondence" in Proceeding of the 1985 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, IEEE Computer Society Press, pp/311-319. [222] M. Kidode: "Image Processing Machines in Japan" in IEEE Computer, Vol. 16, No.1 (January 1983), pp/68-80. [223] H. Kikuchi: "Presto: A Bus Connected Multiprocessor of a Rete-Based Production System" in H. Burkhart (Ed.): CONPAR '90 - VAPP IV, Springer-Verlag LNCS 457, pp/63-74. [224] M. J. Kimmel et al.: "MITE: Morphic Image Transform Engine - an Architecture for Reconfigurable Pipelines of Neighborhood Processors" in Proceeding of the 1985 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, IEEE Computer Society Press, pp/483-500. [225] D. Kirk an

[229] W. Kluge and C. Schmittgen: "Reduction Languages and Reduction Systems" in P. Tre• leaven and M. Vanneschi (Eds.): "Future Parallel Computers," Springer-Verlag LNCS 272, 1987, pp/153-184. [230] R. Kober and C. Kuznia: "SMS - a Multiprocessor Architecture for High Speed Nu• merical Applications" in Proceedings of the 1978 International Conference on Parallel Processing, pp/18-23. [231] T. Kondo et al.: "Pseudo-MIMD Array Processor - AAP2" in 13th Annual Interna• tional Symposium on Computer Architecture, 1986, pp/33O-337. [232] J. Konicek et al.: "The Organization of the CEDAR System" in Proceedings of the 20th International Conference on Parallel Processors, 1991, pp/(I-)49-56. [233] H. Kopp: "Numerical Weather Forecasting With the Multi-Microprocessor System SMS- 201" in M. Feilmeyer (Ed.): "Parallel Computers - Parallel Mathematics," North• Holland, 1977, pp/265-268. [234] J. S. Kowalik (Ed.): "Parallel MIMD Computation: HEP Supercomputer and its Ap• plications," MIT Press, 1985. [235] B. Kriise, B. Gudmundsson and D. Antonsson: "PICAP and Relational Neighborhood Processing in FIP" in K. Preston and L. Uhr (Eds.): "Multicomputers and Image Processing, Algorithms and Programs," Academic Press, 1982, pp/31-46. [236] B. Kriise and B. Gudmundsson: "Parallelism in PICAP" in K. Preston and L. Uhr (Eds.): "Multicomputers and Image Processing, Algorithms and Programs," Academic Press, 1982, pp/231-240. [237] F. D. Kubler: "Cluster Oriented Architecture for the Mapping of a Parallel Processor Network to High Performance Applications" in Proceedings of the 2nd International Conference on Supercomputing, 1988, ACM Press, pp/179-189. [238] F.-D. Kiibler and F. Liicking: "A Cluster-Oriented Architecture for the Mapping of Parallel Processor Networks to High-Performance Applications" in Proceedings of the 1988 International Conference on Supercomputing, pp/179-189. [239] D. J. Kuck: "A Survey of Parallel Machine Organization and Programming" in Com• puting Surveys, Vol. 9, No.1 (March 1977), pp/29-59. [240] D. J. Kuck: "The Structure of Computers and Computation," Wiley, 1978. [241] R. M. Kuczewski, M. H. Mayers and W. J. Crawford: "Neurocomputer Workstations and Processors: Approaches and Applications" in IEEE First International Conference on Neural Networks, 1987, Vol. 3, pp/487-500. [242] A. V. Kulkarni and D. W. L. Yen: "Systolic Processing and an Implementation for Signal and Image Processing" in IEEE Transaction on Computers, Vol. C-31, No. 10 (October 1982), pp/100o-1009. [243] H. T. Kung: "WARP Experience: We Can Map Computations onto a Parallel Computer Efficiently" in Proceedings of the 2nd International Conference on Supercomputing, 1988, ACM Press, pp/668- 675. [244] J. H. Lala and C. J. Smith: "Performance and Economy of a Fault-Tolerant Multipro• cessor" in AFIPS Conference Proceedings Vol. 48,1979 National Computer Conference, pp/482-492. [245] A. R. Larrabee, K. E. Pennick and S. M. Stem: "BBN Butterfly Parallel Processor" in R. G. Babb II (Ed.): "Programming Parallel Processors," Addison-Wesley, 1988, pp/43-57. [246] R. M. Lea: "VLSI & WSI Associative String Processors for Computer Vision" in I. Page (Ed.): "Parallel Architectures and Computer Vision," Oxford Science Publications, 1988, pp/283-297. [247] R. M. Lea: "VLSI & WSI Associative String Processors for Structured Data Processing" in lEE Proceedings, 133 Part E3, 1986, pp/153-162. [248] R. M. Lea: "SCAPE: A Single Chip Array Processing Element for Signal and Image Processing" in lEE Proceedings, 133 Part E3, 1986, pp/145-151. [249] R. M. Lea: "ASP: A Cost Effective Parallel Microcomputer" in IEEE Micro, Vol. 8, No.5 (October 1988), pp/1o-29. [250] R. M. Lea: "WASP: A Wafer Scale Massively Parallel Computer" in 1990 Proceedings of the International Conference on Wafer Scale Integration, pp/36-42 . Bibliography 135

(251) M. Lease and M. Lively: "Comparing Production Machine Architectures" in Computer Architecture News, Vol. 16, No.4 (September 1988), pp/108-116. (252) P. A. Lee: "Parallel Processing on the Multimax Computer System" in C. Jesshope (Ed.): "Major Advances in Parallel Processing," The Technical Press - Unicom, 1986, pp/204-21O. (253) M. D. P. Leland and W. D. Roome: "The Silicon Database Machine: Rationale, De• sign and Results" in M. Kitsuregawa and H. Tanaka (Eds.): "Database Machines and Knowledge Base Machines," Kluwer Academic Publishers, 1988, pp/311-324. (254) D. Lenoski et al.: "The Directory-Based Cache Coherence Protocol for the DASH Mul• tiprocessor" in Proceedings of the 17th Annual International Symposium on Computer Architecture, 1990, pp/148-159. (255) D. Lenoski et al.: "Design of a Scalable Shared-Memory Multiprocessor: The DASH Approach" in Proceedings of the IEEE COMPCON Spring '90, pp/62-67. (256) D. Lenoski et al.: ''The DASH Prototype: Implementation and Performance" in Pro• ceedings of the 19th Annual International Symposium on Computer Architecture, 1992, pp/92-103. (257) G. R. Lewis, J. S. Henry and B. P. McCune: ''The BTI-8000 Homogeneous, General• Purpose Multiprocessor" in AFIPS Conference Proceedings, Vol. 48, 1979 National Computer Conference, pp/513-528. (258) S. L. Lillevik: "The Touchstone 30GF DELTA Prototype," in Proceedings of the 6th Distributed Memory Computing Conference (DMCC VI)," IEEE Press 2290, April, 1991, pp/671-677. (259) J. R. Lineback: "Parallel Processing: Why a Shakeout Nears" in Electronics, Vol. 58 No. 43 (October 1985), pp/32-34. [260) R. M. Lougheed and D. L. McCubbrey: "The Cytocomputer: A Practical Pipelined Processor" in Proceedings of the 7th Annual International Conference on Computer Architecture, 1980, pp/271-277. (261) O. A. McBryan: "New Architectures Performance Highlights and New Algorithms," in Parallel Computing, Vol. 7, 1988, pp/477-499. (262) J. V. McCanny and J. G. McWhirter: "Some Systolic Array Development in the United Kingdom" in IEEE Computer, Vol. 20, No.7 (July 1987), pp/51-63. (263) D. McLeman: "A Parallel UNIX System" in C. Jesshope (Ed.): "Major Advances in Parallel Processing," The Technical Press - Unicom, 1986, pp/155-166. (264) J. McLeod: "Ardent Launches First 'Supercomputer on a Desk'" in Electronics, March 3, 1988, pp/65-67. [265] E. Mahle: "Multiprocessor Testbed DIRMU 25: Efficiency and Fault Tolerance" in G. Paul and G. Almasi (Eds.): Parallel Systems and Computations," Elsevier Science Publishers (North-Holland), 1988, pp/149-163. (266) R. Manner: "Hardware Task/Processor Scheduling in a Polyprocessor Environment" in IEEE Transactions on Computers, Vol. C-33, No.7 (July 1984), pp/626-636. (267) R. Manner et al.: "Multiprocessor Simulation of Neural Networks with NERV" in Pro• ceeding of Supercomputing '89, pp/457-465. (268) R. Manner, O. Stucky and W. Ludwig: "The Heidelberg Polyp Multiprocessor Project" in C. R. Jesshope and K. D. Reinartz (Eds.): CONPAR 88, Cambridge University Press, 1989, pp/456-462. (269) T. Manuel: "Convex Unwraps its First Grown-Up Supercomputer" in Electronics, March 3, 1988, pp/59-61. [270) T. Manuel: "How Sequent's New Model Outruns Most Mainframes" in Electronics, May 28, 1987, pp/76-78. (271) C. Maples and D. Logan: "The Advantage of an Adaptive Multiprocessor Architecture" in A. Wouk (Ed.): "New Computing Environments: Parallel, Vector and Systolic," SIAM, 1986, pp/154-178. (272) M. Maresca and H. Li: "A VLSI Implementation of Polymorphic-Torus Architecture" in S. Winter and H. Schumny (Eds.): "Supercomputers: Technology and Applications," Proceedings of the 14th EUROMICRO Sy~posium on Microprocessing and Micropro• gramming (EUROMICRO '88), pp/737-742. 136 Bibliography

[273) M. P. Mariam and E. J. Henry: "PEPE - A User's Viewpoint; a Powerful Real• Time Adjunct" in AFIPS Conference Proceedings, Vol. 47, 1978 National Computer Conference, pp/993-1002. [274) P. B. Mark: "The Sequoia Computer: A Fault Tolerant Tightly Coupled Multiprocessor Architecture" in Proceedings of the 12th Annual International Symposium on Computer Architecture, p/232. [275) M. A. Marsan et al.: "Modeling the Software Architecture of a Prototype Parallel Machine" in ACM SIGMETRICS Performance Evaluation Review, Vol. 15, No.1 (May 1987), pp/175-185. [276) A. J. Martin: "A Message Passing Model for Highly Concurrent Computation" in Pro• ceedings of the Third Conference on Hypercube Concurrent Computers and Applica• tions, 1988, Vol. I, pp/520-527. [277) O. E. Marvel: "HAPPE - The Honeywell Associative Parallel Processing Ensemble" in Proceedings of the 1st Annual Symposium on Computer Architecture, pp/261-267. [278) MasPar Computer Corp., October 6, 1992, release announcing the MasPar MP-2. [279) N. Matelan: "The FLEX/32 Multicomputer" in Proceedings of the 12th Annual Inter• national Symposium on Computer Architecture, 1985, pp/209-213. [280) H. Matsuda et al.: "Parallel Prolog Machine PARK: Its Hardware Structure and Prolog System" in E. Wada (Ed.): Logic Programming '85, Proceedings of the 4th Conference, Springer-Verlag LNCS 221, 1985, pp/35-43. [281) H. Matsushima, T. Uno and M. Ejiri: "An Array Processor for Image Processing" in M. Onoe, K. Preston, Jr. and A. Rosenfeld (Eds.): "Real Time/Parallel Computers," Plenum Press, 1981, pp/325-338. [282) MCC Parallel Processing Survey, 1984. [283) Meiko: "The Meiko Computing Surface: An Example of a Massively Parallel System" in Proceedings of the Third Conference on Hypercube Concurrent Computers and Ap• plications, 1988, Vol. I, pp/852-859. [284) W. C. Meilander: "History of Parallel Processing at Goodyear Aerospace" in Proceed• ings of the 1981 International Conference on Parallel Processing, pp/5-15. [285) L. Meng: "Analysis of Some Experimental Results for the TDM" in H. Boral and P. Fan• demay (Eds.): "Database Machines," Springer-Verlag LNCS 368, 1989, pp/373-386. [286) A. Merigot, B. Zavidovique and F. Devos: "SPHINX, A Pyramidal Approach to Parallel Image Processing" in Proceedings of the Workshop on Computer Architecture for Pat• tern Analysis and Image Database Management, IEEE Computer Science Press, 1985, pp/107-111. [287) J. Milde, T. Pliickebaum and W. Ameling: "Synchronous Communication of Cooperat• ing Processes in the M5PS Multiprocessor" in N. Handler et al. (Eds.): CONPAR '86, Springer-Verlag LNCS 237, 1986, pp/142-148. [288) P. C. Miller, C. E. St. John and S. W. Hawkinson: "FPS-T Series" in R. G. Babb II (Ed.): "Programming Parallel Processors," Addison-Wesley, 1988, pp/73-91. [289) N. N. Mirenkov: "High-Performance Computer System 'Siberia'" in H. Burkhart (Ed.): CONPAR '90 - VAPP IV, Springer-Verlag LNCS 457, pp/806-815. [290) J. Miyazaki et al.: "A New Version of a Parallel Production System Machine MANJI• II" in H. Boral and P. Fandemay (Eds.): "Database Machines," Springer-Verlag LNCS 368, 1989, pp/317-330. [291) K. B. Modahl: "Cray X-MP" in R. G. Babb II (Ed.): "Programming Parallel Proces• sors," Addison Wesley, 1988, pp/59-72. [292) N. Mokhoff: "Parallelism Breeds a New Class of Supercomputers" in Computer Design, March 15, 1987, pp/53-64. [293) S. Momoi et al.: "Hierarchical Array Processor System (HAP)" in N. Handler et al. (Eds.): CONPAR 86, Springer-Verlag LNCS 237, 1986, pp/311-318. [294) L. Monier and P. Sindhu: "The Architecture of DRAGON" in Spring COMPCON '85 pp/118-121. [295) N. Morgan et al.: "The RAP: A Ring Array Processor for Layered Network Calcu• lations" in Proceedings of the International Conference on Application Specific Array Processors, 1990, pp/296-308. Bibliography 137

[296J N. Morgan et al.: "The Ring Array Processor: A Multiprocessing Peripheral for Con• nectionist Applications" in Journal of Parallel and Distributed Computing, Vol. 14, No.3 (March 1992), pp/248-259. [297J S. G. Morton: "A Fault Tolerant, Bit-Parallel, Cellular Array Processor" in Proceedings of the 1986 Fall Joint Computer Conference, pp/277-286. [298J S. G. Morton, E. Abreu and F. Tse: "ITT CAP - Towards the Personal Supercom• puter" in IEEE Micro, Vol. 5, No.6 (December 1985), pp/37-49. [299J R. Mount: "Alternatives in High-Volume HEP Computing" in L. O. Hertzberger and W. Hoogland, (Eds.): Proceedings of the Conference on Computing in High Energy Physics, North-Holland 1986, pp/107-122. [300J K. Murakami et al. : "An Overview of the Kyushu University Reconfigurable Parallel Processor" in Computer Architecture News, Vol. 16, No.4 (September 1988), pp/130- 137. [301J Y. Muroka and T. Marushima: "Major Research Activities in Parallel Processing in Japan" in E. N. Houstis, T. S. Papatheodorou and C. D. Plychronopoulos (Eds.): 1st International Conference on Supercomputing (1987), Springer-Verlag LNCS 297, 1987, pp/836-853. [302J S.-i. Nakamura et al.: "A High Speed Database Machine, HDM" in M. Kitsuregawa and H. Tanaka (Eds.): "Database Machines and Knowledge Base Machines," Kluwer Academic Publishers, 1988, pp/237-250. [303J T. Nash: "Event Parallelism: Distributed Memory Parallel Computing for High Energy Physics Experiments" in Computer Physics Communications, Vol. 57 (1989) pp/47-56. [304J R. Natarajan and R. Marinelli: "The LINPACK Benchmark on the IBM POWER Visualization System", IBM Research Report RC 17964 (~78968), April 5, 1998. [305J E. Nestle and A. Inselberg: "The Synapse N+1 System: Architectural Characteristics and Performance Data of a Tightly Coupled Multiprocessor System" in Proceedings of the 12th Annual International Symposium on Computer Architecture, 1985, pp/233- 239. [306J J. R. Nickolls: "The Design of the MasPar MP-1: A Cost Effective Massively Parallel Computer" in Proceedings of the IEEE COMPCON Spring '90, pp/25-28. [307J H. Nishimura et al.: "LINKS-I: A Parallel Pipelined Multiprocessor System for Image Creation" in Proceedings of the 10th Annual International Symposium on Computer Architecture, 1983, pp/387-396. [308J T. Nishitani et al.: "A Real-Time Software Programmable Processor for HDTV and Stereo Scope Signals" in Proceedings of the International Conference on Application of Specific Array Processors, 1990, pp/226-234. [309J M. Noakes et al.: "The J-Machine Multicomputer: An Architectural Evaluation" in Proceedings of the 20th International Symposium on Computer Architecture" May 1993. [31OJ E. A. M. Odijk: "POOMA, POOL and Parallel symbolic Computing: An Assessment" in H. Burkhart (Ed.): CONPAR '90 - VAPP IV, Springer-Verlag LNCS 457, pp/26-39. [311J P. R. Oestreicher: "The ES-1: A Supercomputing Architecture for the 90's" in Pro• ceedings of the IEEE COMPCON Spring '90, pp/16-19. [312J R. O'Gorman: "The RPA - Making the Array Approach Acceptable" in C. Jesshope (Ed.): "Major Advances in Parallel Processing," The Technical Press - Unicom, 1986, pp/130-145. [313J R. Ohbuchi: "Overview of Parallel Processing in Japan," in Parallel Computing, Vol. 2 (1985), pp/219-228. [314J Y. Okada, H. Tajima and R. Mori: "A Reconfigurable Parallel Processor with Micro• program Control" in IEEE Micro, Vol. 2, No.4 (November 1982), pp/48-60. [315J J. F. Palmer: "The NCUBE Family of High-Performance Parallel Computer Systems" in Proceedings of the Third Conference on Hypercube Concurrent Computers and Ap• plications, 1988, Vol. I, pp/847-851. [316J G. M. Papadopoulos and D. E. Culler: "Monsoon: An Explicit Token-Store Archi• tecture" in Proceedings of the 17th Annual International Symposium on Computer Architecture, 1990, pp/82-91. 138 Bibliography

[317) D. M. Pase and A. R. Larrabee: "iPSC" in R. G. Babb II (Ed.): "Programming Parallel Processors," Addison-Wesley, 1988, pp/105-124. [318) S. Pass: "The GRID Parallel Computer System" in J. Kittler and M. J. B. Duff (Eds.): "Image Processing Systems Architecture," Research Studies Press, 1985, pp/23-35. [319) W. Paul and D. Scheerer: "The DATIS-P Parallel Machine" in Proceedings of the Twenty-Fourth Hawaii International Conference on System Sciences, 1991, Vol. I, pp/560-571. [320) K. Peinze: "The SUPRENUM Prototype: Status and Experiences" in Parallel Com• puting, Vol. 7 (1988), pp/297-313. [321) A. Perez et al.: "OUPPI-1, A SIMD Computer Using Integrated Parallel Processors" in C. R. Jesshope and K. D. Reinartz (Eds.): CONPAR 88, Cambridge University Press, 1989, pp/205-212. [322) C. Peterson, J. Sutton and P. Wiley: "iWARP: A 100-MOPS LIW Microprocessor for Multicomputers" in IEEE Micro, June 1991, pp/26-29, 81-87. [323) S. L. Peyton-Jones et al.: "GRIP - A High Performance Architecture for Parallel Graph Reduction" in G. Kahn (Ed.): "Functional Programming Languages and Computer Architecture," Springer-Verlag LNCS 274, 1987, pp/98-112. [324) Philips Research Laboratories: "POOL and DOOM: A Survey of ESPRIT 415 Subpro• ject A" in E. Odijk, M. Rem and J.-C. Syre (Eds.): "PARLE '89 - Parallel Architec• tures and Languages Europe," Springer-Verlag LNCS 366, 1989, Vol. I, pp/356-373. [325) G.F. Pfister et al.: "The IBM Research Parallel Processor Prototype (RP3), Introduc• tion and Architecture" in Proceedings of the 1985 International Conference on Parallel Processing, pp/764-771. [326) F. Pollack et al.: "A VLSI-Intensive Fault Tolerant Computer Architecture" in Pro• ceedings of the IEEE COMPCON Spring '90, pp/134-142. [327) F. Pollack et al.: "An Object-Oriented Distributed Operating System" in Proceedings of the IEEE COMPCON Spring '90, pp/143-152. [328) C. A. Prete: "RST Cache Memory Design for A Tightly Coupled Multiprocessor Sys• tem" in IEEE Micro, April 1991, pp/16-19, 40-52. [329) "The Princeton Engine: A Massively Parallel Computer for Real-time, Continuous Video and Image Processing," David Sarnoff Research Center. [330) B. J. Procter and C. J. Skelton: "Flagship is Nearing Port" in C. R. Jesshope and K. D. Reinartz (Eds.): CONPAR 88, Cambridge University Press, 1989, pp/100-107. [331) R. Raghavan, K. K. Jung and H. T. Nguyen: "Fine Grain Parallel Processors and Real• Time Applications: MIMD Controller/ SIMD Array" in Proceedings of the International Conference on Pattern Recognition, 1990, Vol. 2, pp/324-331. [332) B. Ramakrishna Rau et al.: "The Cydra 5 Departmental Supercomputer" in IEEE Computer, Vol. 22, No.1 (January 1989), pp/12-35. [333) G. Raupp and H. Richter: "The MULTITOP Parallel Computers for ASDEX-Upgrade" in Proceedings of the 20th International Conference on Parallel Processors, 1991, pp/(I-)656--£57. [334) "RCA Corporation Model 215 Military Computer" in P. H. Enslow, Jr. (Ed.): "Multi• processors and Parallel Processing," Wiley, 1974, pp/257-263. [335) S. Reinhardt: "The Parallel Processing Aspects ofthe CRAY Y-MP Computer System" in Proceedings of the 1988 International Conference on Parallel Processing, pp/311-314. [336) S. F. Reddaway: "DAP: A Distributed Array Processor" in 1st Annual Symposium on Computer Architecture, 1973, pp/61-70. [337) R. T. Ritchings: ''The CYBA-M Multiprocessor for Image Processing" in J. Kittler and M. J. B. Duff (Eds.): "Image Processing Systems Architecture," Research Studies Press, 1985, pp/103-114. [338) I. N. Robinson and W. R. Moore: "A Parallel Processor Array and its Implementation in Silicon" in IEEE 1980 Custom Integrated Circuits Conference, pp/41-45. [339) J. Rose, W. Loucks and Z. Vransic: "FERMATOR: A Tunable Multiprocessor Archi• tecture" in IEEE Micro, Vol. 5, No.4 (August 1985), pp/5-17. [340) A. Rosenblatt: "Engineering Workstations: What the Vendors Wrought" in IEEE Spec• trum, April 1992, pp/38-42. Bibliography 139

[341] J. A. Rudolph: "A Productive Implementation of an Associative Array Processor: STARAN" in AFIPS Conference Proceedings, Vol. 41, Part I, Fall Joint Computer Conference, 1972, pp/229-241; also appeared in [399], pp/317-331. [342] J. A. Rudolph and K. E. Batcher: "STARAN Parallel Processor System Hardware" in AFIPS Conference Proceedings, Vol. 43, 1974 National Computer Conference, pp/405- 410. [343] L. Rudolph et al.: "Envelopes in Adaptive Local queues for MIMD Load Balancing," CONPAR '92 - VAPP V, July 1992, pp/479-484. [344] A. B. Ruighaver: "A Decoupled Multicomputer Architecture with Optical Full Inter• connect" in L. Bouge et al. (Eds.): Proceedings of Parallel Processing CONPAR '92 - VAPP V, Springer-Verlag LNCS 634, pp/405--41O. [345] D. Rutovitz and A. J. Travis: "Parallel Processing Architectures for Image Analysis" in C. Jesshope (Ed.): "Major Advances in Parallel Processing," The Technical Press• Unicom, 1986, pp/288-306. [346] C. Saito et al.: "An Adaptable Cluster Structure of the (SM)2-II" in N. Handler et al. (Eds.): Proceedings of CONPAR '86, Springer-Verlag LNCS 237, 1986, pp/53--60. [347] C. L. Saitz et al.: "The Architecture and Programming of the Ametek Series 2010 Multi• computer" in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988, Vol. I, pp/33-36. [348] S. Sakai et al.: "An Architecture of a Dataflow Single Chip Processor" in Proceedings of the 16th Annual International Symposium on Computer Architecture, pp/46--53. [349] S. Sakai, Y. Komada and Y. Yamaguchi: "Design and Implementation of a Versatile Interconnection Network in the EM-4" in Proceedings of the 20th International Con• ference on Parallel Processors, 1991, pp/(I-)426-430. [350] S. Sakai, Y. Komada and Y. Yamaguchi: "Prototype Implementation of a Highly Paral• lel Dataflow Machine EM-4" in Proceedings of the 5th International Parallel Processing Symposium, 1991, pp/278-286. [351] "Sanders Associates OMEN-60 Orthogonal Computers" in P. H. Enslow, Jr. (Ed.): "Multiprocessors and Parallel Processing," Wiley, 1974, pp/264-273. [352] J. Sanguinetti and B. Kumar: "Performance of a Message-Based Multiprocessor" in Proceedings of the 12th Annual International Symposium on Computer Architecture, pp/424-425. [353] S. Sasaki et al.: "High Speed Pipelined Image Processor" in S. P. Kartashev and I. S. Kartashev (Eds.): Supercomputing Systems, Proceedings of the 1st International Conference, 1985, pp/476--484. [354] H. Sato et al.: "Fast Image Generation of Constructive Solid Geometry Using a Cellular Array Processor" in H. K. Reghbati and A. Y. C. Lee (Eds.): "Computer Graphics Hardware - Image Generation and Display," IEEE Computer Society Press, 1988, pp/302-309. [355] M. Sato, H. Matsuura, H. Ogawa and T. Iijima: "Multimicroprocessor System PK-1" in K. Preston Jr. and L. Uhr (Eds.): "Multicomputers and Image Processing, Algorithms and Programs," Academic Press, 1982, pp/361-371. [356] M. Satyanarayanan: "Multiprocessors, A Comparative Study," Prentice-Hall, 1980. [357] J. Savage: "Parallel Processing as a Language Design Problem" in Proceedings of the 12th Annual International Symposium on Computer Architecture, 1985, pp/221-224. [358] D. H. Schaefer et al.: "The GAM Pyramid" in L. Uhr (Ed.): "Parallel Computer Vision," Academic Press, 1987, pp/15-42. [359] U. Schmidt and K. Caesar: "Datawave: A Single-Chip Multiprocessor for Video Appli• cations" in IEEE Micro, June 1991, pp/22-25, 88-94. [360] L. A. Schmitt and S. J. Wilson: "The AIS-5000 Parallel Processor" in IEEE Transac• tions on Pattern Analysis and Machine Intelligence, Vol. 10, No.3 (May 1988), pp/320- 330. [361] C. L. Seitz: "The Cosmic Cube" in Communications of the ACM, Vol. 28 No.1 (January 1985), pp/22-33. [362] The Serlin Report on Parallel Processing, No. 39/40 (August/September 1990), p/lO. [363] The Serlin Report on Parallel Processing, No. 13 (June 1988), p/6. 140 Bibliography

[364] The Serlin Report on Parallel Processing, No. 11 (April 1988), p/8. [365] The Serlin Report on Parallel Processing, No. 37 (June 1990), p/10. [366] The Serlin Report on Parallel Processing, No. 11 (April 1988), p/9. [367] The Serlin Report on Parallel Processing, No. 26 (July 1989), pp/4-7. [368] The Serlin Report on Parallel Processing, No. 11 (April 1988), p!lO. [369] The Serlin Report on Parallel Processing, No. 39/40 (August/September 1990), pp/8- 10. [370] The Serlin Report on Parallel Processing, No. 15 (August 1988), p/lO. [371] The Serlin Report on Parallel Processing, No.5 (October 1987), p/12. [372] The Serlin Report on Parallel Processing, No. 11 (April 1988), p/1l. [373] The Serlin Report on Parallel Processing, No. 24 (May 1989), pp/3-5. [374] The Serlin Report on Parallel Processing, No. 39/40 (August/September 1990), pp/6-8. [375] The Serlin Report on Parallel Processing, No. 32/33 (February 1990), p/l. [376] The Serlin Report on Parallel Processing, No. 26 (July 1989), pp/7-1O. [377] The Serlin Report on Parallel Processing, No. 25 (June 1989), p/9. [378] The Serlin Report on Parallel Processing, No. 39/40 (August/September 1990), p/18. [379] The Serlin Report on Parallel Processing, No. 25 (June 1989), p/8. [380] The Serlin Report on Parallel Processing, No. 26 (July 1989), pp/1O-13. [381] The Serlin Report on Parallel Processing, No.3 (August 1987), p/13. [382] The Serlin Report on Parallel Processing, No. 26 (July 1989), pp/13-14. [383] The Serlin Report on Parallel Processing, No. 15 (August 1988), p/1l. [384] The Serlin Report on Parallel Processing, No. 10 (March 1988), pp/5-12. [385] The Serlin Report on Parallel Processing, No. 31 (December 1989), pp/6-7. [386] S. Shams and K. W. Przytula: "Mapping of Neural Networks onto Programmable Par• allel Machines" in Proceedings of the IEEE International Symposium on Circuits and Systems, 1990, Vol. 4, pp/2613-2617. [387] D. E. Shaw: "SIMD and MSIMD Variants of the NON-VON Supercomputer" in Pro• ceedings of the COMPCON S'84, pp/360-363. [388] T. Shibuya et al.: "Application Specific Massively Parallel Machine" in Proceedings of the 3rd Symposium on the Frontiers of Massively Parallel Computation, 1990, pp/274- 277. [389] Y. Shih and J. Fier: "Hypercube Systems and Key Applications" in K. Hwang and D. DeGroot (Eds.): "Parallel Processing for Supercomputers & Artificial Intelligence," McGraw-Hill, 1989, pp/203-244. [390] T. Shimizu, T. Horie and H. Ishihata: "Low-Latency Message Communication Sup• port for the AP1000" in Proceedings of the 19th Annual International Symposium on Computer Architecture, 1992, pp/288-297. [391] T. Shimuzu: Personal Communication, October 1992. [392] T. Shirakawa et al.: "QCDPAX - An MIMD Array of Vector Processors for the Nu• merical Simulation of Quantum Chromodynamics" in Proceedings of Supercomputing '89, pp/495-504. [393] R. R. Shively et al.: "A High Performance Reconfigurable Parallel Processing Architec• ture" in Proceedings of Supercomputing '89, pp/505-509. [394] H. Shomberg: "A Transputer-Based Shuffle-Shift Machine for Image Understanding and Reconstruction" in Proceedings of the 10th International Conference on Pattern Recognition, 1990, pp/445-450. [395] H. J. Siegel et al.: "Communication Techniques in Parallel Processing" in R. Dierstein, D. Miiller-Wichards and H.-M. Wacker (Eds.): "Parallel Computing in Science and En• gineering," 4th International DFVLR Seminar on Foundations of Engineering Sciences, Springer-Verlag LNCS 295, 1987, pp/35-60. [396] H. J. Siegel et al.: "PASM: A Reconfigurable Parallel System for Image Processing" in Computer Architecture News, Vol. 12, No.4 (September 1984), pp/7-20. [397] H. J. Siegel et al.: "PASM: A Partitionable SIMD/MIMD System for Image Process• ing and Pattern Recognition" in IEEE Transactions on Computers, Vol. C-30, No. 12 (December 1981), pp/934- 947. Bibliography 141

[398] M. Siegle and R. Hofmann: "Monitoring Program Behaviour on SUPRENUM" in Pro• ceedings of the 19th Annual International Symposium on Computer Architecture, 1992, pp/332-341. [399] D. P. Siewiorek, C. G. Bell and A. Newell: "Computer Structures: Principles and Examples," McGraw-Hill Computer Science Series, 1985, pp/99-104. [400] W. Simhoefer: Personal Communication, September 1992. [401] A. C. Sleigh and P. K. Bailey: "DIPOD: An Image Understanding Development and Implementation System" in Pattern Recognition Letters, Vol. 6, No.2 (July 1987), pp/l0l-106. [402] A. C. Sleigh, C. J. Radford and G. J. Harp: "RSRE Experience Implementing Computer Vision Algorithms on Transputers, DAP and DIPOD Parallel Processors" in 1. Page (Ed.): "Parallel Architectures and Computer Vision," Oxford Science Publications, 1988, pp/133-155. [403] D. L. Slotnick: "Centrally Controlled Parallel Processors" in Proceedings of the 1981 International Conference on Parallel Processing, pp/I6-24. [404] G. J. M. Smit and P. G. Jansen: "The Communication Processor of TUMULT-64" in S. Winter and H. Schumny (Eds.): "Supercomputers: Technology and Applications," Proceedings of the 14th EUROMICRO Symposium on Microprocessing and Micropro• gramming (EUROMICRO '88), pp/519-524. [405] B. J. Smith: "A Pipelined, Shared Resource MIMD Computer" in Proceedings of the 1978 International Conference on Parallel Processing, pp/6-8. [406] B. J. Smith: "Shared Memory, Vectors, Message Passing, and Scalability" in R. Dier• stein, D. Miiller-Wichards and H.-M. Wacker (Eds.): "Parallel Computing in Science. and Engineering," 4th International DFVLR Seminar on the Foundations of Engineering Sciences, Springer-Verlag LNCS 295, 1987, pp/29-34. [407] K. Smith: "New Computer Breed Uses Transputers For Parallel Processing" in Elec- tronics, February 24, 1983, pp/67-68. [408] M. Snir: Personal Communication, October 1989. [409] M. Snir: Personal Communication, June 1993. [410] L. Snyder: "Introduction to the Configurable, Highly Parallel Computer" in IEEE Computer, January 1982, pp/47-56. [411] L. Snyder: "A Taxonomy of Synchronous Parallel Machines" in Proceedings ofthe 1988 International Conference on Parallel Processing, pp/281-285. [412] D. Steele and N. Clark: "The Evolution of a Real-Time Tightly-Coupled Parallel Pro• cessing System" in C. Jesshope (Ed.): "Major Advances in Parallel Processing," The Technical Press - Unicorn, 1986, pp/179-192. [413] P. Stenstrom and L. Philipson: "A Layered Emulator for Design Evaluation of MIMD Multiprocessors with Shared Memory" in PARLE '87 - Proceedings of Parallel Archi• tectures and Languages Europe, 1987, Springer-Verlag LNCS 258, pp/329-344. [414] S. R. Sternberg: "Language and Architecture for Parallel Image Processing" in E. S. Gelsma and L. N. Kanal (Eds.): Proceedings of the Conference on Pattern Recog• nition in Practice, North-Holland, 1980. [415] S. R. Sternberg: "Parallel Architectures for Image Processing" in M. Onoe, K. Pre• ston, Jr. and A. Rosenfeld (Eds.): "Real Time/Parallel Computers," Plenum Press, 1981, pp/347-;359. [416] T. Stiemerling, T. Wilkinson and A. Saulsbury: "Implementing DVSM on the TOPSY Multicomputer" in Proceeding of SEDMS III, Symposium on Experiences with Dis• tributed and Multiprocessor Systems, 1992, pp/263-278. [417] R. A. Stokes: "Burroughs Scientific Processor" in D. J. Kuck, D. H. Lawrie and A. Sameh (Eds.): "High Speed Computer and Algorithm Organization," Academic Press, 1977, pp/85-89. [418] R. Stokes and R. Cantarella: "History of Parallel Processing at Burroughs" in Proceed• ings of the 1981 International Conference on Parallel Processing, pp/25-32. [419] S. J. Stolfo and D. P. Miranker: "The DADO Production System Machine" in Journal of Parallel and Distributed Computing, Vol. 3, No.2 (1986), pp/269-296. 142 Bibliography

[420] S. J. Stolfo and D. P. Miranker: "DADO: A Parallel Processor for Expert Systems" in Proceedings of the 1984 International Conference on Parallel Processing, pp/74-82. [421] H. S. Stone: "High-Performance Computer Architecture," Addison-Wesley, 1987. [422] L. Storc: "Sequent Balance" in R. G. Babb II (Ed.): "Programming Parallel Proces• sors," Addison-Wesley, 1988, pp/143-154. [423] L. Stringa: "EMMA - An Industrial Experience on a Large Multiprocessing Archi• tecture" in Proceedings of the 10th Annual International Symposium on Computer Architecture, 1983, pp/326-333. [424] S. Sugimoto et al.: "A Multi-Microprocessor System for Concurrent LISP" in Proceed• ings of the 1988 International Conference on Parallel Processing, pp/135-143. [425] Computer Corporation announcements of the SPARCserver 10 ver• sions of the 600MP Series and of the SPARCstation 10, May 19, 1992. [426] N. Suzuki: "TOP-1 Multiprocessor Workstation" in T. Ito and R. H. Halstead (Eds.): "Parallel LISP, Languages and Systems", Proceedings of the US/Japan Workshop on Parallel LISP, Springer-Verlag LNCS 441, 1990, pp/353-364. [427] J.-C. Syre, D. Comte and N. Hifdi: "Pipelining, Parallelism and Asynchronism in the LAU System" in Proceedings of the 1977 International Conference on Parallel Process• ing, pp/87-92. [428] N. Takahashi and M. Amamiya: "A Data Flow Processor Array System: Design and Analysis" in Proceedings of the 10th Annual International Symposium on Computer Architecture, 1983, pp/243-250. [429] Y. Takahashi and S. Sasaki: "Parallel Automated Wire-Routing with a Number of Competing Processors" in Proceeding of the 1990 International Conference on Super• computing, pp/31O-317. [430] N. Tanabe et al.: "Base-m n-cube: High Performance Interconnection Networks for Parallel Computer PRODIGY" in Proceedings of the 20th International Conference on Parallel Processors, 1991, pp/(I-)509-516. [431] S. L. Tanimoto, T. J. Ligocki and R. Ling: "A Prototype Pyramid Machine for Hier• archical Cellular Logic" in L. Uhr (Ed.): "Parallel Computer Vision," Academic Press, 1987, pp/43-83. [432] T. Temma et al.: "Chip Oriented Dataflow Image Processor: TIP-3" in Proceedings of COMPCON Fall '84, pp/245-254. [433] A. E. Terrano: "The QCD Machine" in B. J. Alder (Ed.): "Special Purpose Computers," Academic Press, 1988, pp/41-65. [434] J. A. Test, M. Myszewski and R. C. Swift: "The Alliant FX/Series: A Language Driven Architecture for Parallel Processing of Dusty Deck Fortran" in PARLE '87 - Parallel Architectures and Languages Europe, Springer-Verlag LNCS 258, Vol. I, pp/345-365. [435] C. P. Thacker and L. C. Stewart: "Firefly, a Multiprocessor Workstation" in IEEE Transactions on Computers, Vol. 37, No. 8D, (August 1988), p/909-920. [436] K J. Thurber: "Parallel Processor Architectures - Part 2: Special Purpose Systems" in Computer Design, Vol. 18, No.2 (February 1979), pp/103-114. [437] S. Toborg and KHwang: "Exploring Neural Networks and Optical Computing Tech• nologies" in K Hwang and D. DeGroot (Eds.): "Parallel Processing for Supercomputers & Artificial Intelligence," McGraw-Hill, 1989, pp/609-660. [438] K Toda et al.: "Preliminary Measurements of the ETL LISP-bound Data Driven Machine" in J. V. Woods (Ed.): "Fifth Generation Computer Architectures," North• Holland, 1986, pp/235-253. [439] B. Tokerud, V. S. Anderson and M. Toverud: "CESAR - the Architecture and Imple• mentation of a High-Performance Systolic Array Processor" in Proceedings of the 1988 International Conference on Parallel Processing, Vol. I, pp/47-50. [440] Topology 100 Technical Manuals, Topologix Inc., 1989 .. [441] P. C. Treleaven: "Control Driven, Data Driven and Demand Driven Computer Archi• tectures" (Extended Abstract) in Parallel Computing, Vol. 2 (1985), pp/287-288. [442] P. C. Treleaven: "Future Parallel Computers" in N. Wi.ndler et al. (Eds.): Proceedings of CONPAR '86, Springer-Verlag LNCS 237, 1986, pp/4~7. Bibliograpby 143

[443] P. C. Treleaven: "Parallel Architecture Overview" in Proceedings of the International Conference on Vector and Parallel Processors in Computational Science III (Parallel Computing, Vol. 8, Nos. 1-3, October 1988), pp/59-70. [444] P. C. Treleaven, K. J. Lees, and S. C. McCabe: "Computer Architectures for Artificial Intelligence" in P. Treleaven and M. Vanneschi (Eds.): "Future Parallel Computers," Springer-Verlag LNCS 272, 1987, pp/416-492. [445] P. C. Treleaven, D. R. Brownbridge and R. P. Hopkins: "Data-Driven and Demand• Driven Computer Architectures" in ACM Computing Surveys, Vol. 14, No.1, 1982, pp/93-143. [446] U. Trottenberg: "SUPRENUM - An MIMD Multiprocessor System for Multi-level Sci• entific Computing" in N. Hiindler et al. (Eds.): Proceedings of CONPAR '86, Springer• Verlag LNCS 237, 1986, pp/48-52. [447] J. Thazon, J. Peterson and M. Pniel: "Mark IlIfp Hypercube Concurrent Processor Architecture" in Proceedings of the Third Conference on Hypercube Concurrent Com• puters and Applications, 1988, Vol. I, pp/71-80. [448] L. W. Thcker and G. G. Robertson: "Architecture and Application of the Connection Machine" in IEEE Computer, Vol. 21, No.8 (August 1988), pp/26-38. [449] S. R. Vegdahl: "A Survey of Proposed Architectures for the Execution of Functional Languages" in IEEE Transactions on Computers, Vol. C-33, No. 12 (December 1984), pp/1050-1071. [450] A. H. Veen and R. v. d. Born: "The RC Compiler for the DTN Dataflow Computer" in Journal of Parallel and Distributed Computing, Vol. 10 (1990), No.4, pp/319-332. [451] D. j. Vianney, J. H. Thomas and V. Rabaza: "The Gould NP1 System Interconnection" in Proceedings of the 1988 International Conference on Supercomputing, pp/17D-178. [452] C. R. Vick and J. A. Cornell: "PEPE Architecture: Present and Future" in 1978 National Computer Conference, AFIPS Conference Proceedings, Vol. 47, pp/981-992. [453] J. A. Vlontzos and S. Y. Kung: "A Wavefront Array Processor Using Dataflow Pro• cessing Elements" in E. N. Houstis, T. S. Papatheodorou and C. D. Plychronopoulos (Eds.): 1st International Conference on Supercomputing (1987), Springer-Verlag LNCS 297, 1987, pp/744-767. [454] M. C. Vlot: "The POOMA Architecture" in P. America (Ed.): Parallel Database Sys• tems, PRISMA Workshop 1990, Springer-Verlag LNCS 507, 1991, pp/365-395. [455] C. Vollum: "The XTM Parallel Desktop Supercomputer: Transputers Play Hot" in Proceedings of the IEEE COMPCON Spring '89, pp/61-62. [456] Z. G. Vranesic et al.: "HECTOR: A Hierarchically Structured Shared-Memory Multi• processor" in IEEE Computer, Vol. 24, No.1 (January 1991), pp/72-79. [457] R. A. Wagner, Personal Communication, September 27, 1992. [458] R. S. Wallace and M. D. Howard: "HBA Vision Architecture: Built and Benchmarked" in Proceedings of the Workshop on Computer Architecture for Pattern Analysis and Machine Intelligence, 1987, pp/209-216. [459] "Warp Speed Ahead" in New Products, IEEE Computer, Vol. 24, No. 11 (November 1991), p/75. [460] T. Watanabe: "Advanced Architecture and Technology of Supercomputing Systems" in T. K. S. Murthy and C. A. Brebbia (Eds.): Advances in Computer Technology and Applications in Japan, Springer-Verlag LNE 69,1991, pp/1-11 [461] P. Watson and I. Watson: "Evaluating Functional Programs on the FLAGSHIP Ma• chine" in G. Kahn (Ed.): "Functional Programming Languages and Computer Archi• tecture," Springer-Verlag LNCS 274, 1987, pp/8D-97. [462] C. C. Weems et al.: "An Overview of Architecture Research for Image Understanding at the University of Massachusetts" in Proceedings of the 10th International Conference on Pattern Recognition, 1990, pp/379-384. [463] C. C. Weems: Personal Communication, September 1992. [464] M. Weiser et al.: "Status and Performance of the ZMOB Parallel Processing System" in Spring COMPCON '85,pp/71-74. [465] C. Whitby-Strevens: "The Transputer" in Proceedings of the 12th Annual International Symposium on Computer Architecture, 1985, pp/292-300. 144 Bibliography

[466] L.C. Widdoes and S. Correll: "The 8-1 Project: Developing High-Performance Dig• ital Computers" in Energy and Technology Review, September 1979; reprinted in R. H. Kuhn and D. A. Padua (Eds.): "Thtorial on Parallel Processing," IEEE Computer Society Press, 1981, pp/136-145. [467] L. C. Widdoes: "The Minerva Multi-Microprocessor" in Proceedings of the 3rd Annual Symposium on Computer Architecture, 1976, pp/34-39. [468] T. Williams: "Multiprocessor Boards Boasts Embedded Designs to GigaFLOP Range" in Computer Design, Vol. 30, No. 10 (July 1991), pp/122-123. [469] S. S. Wilson: "The PIXIE-5000 - A Systolic Array Processor" in Proceeding of the 1985 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, IEEE Computer Society Press, pp/477--483. [470] P. Wollcot: Personal Communication, October 21, 1992. [471] P. Wollcot and S. E. Goodman: "Soviet High-Speed Computers: The New Generations" in Supercomputing '90, pp/930-939. [472] J. Worlton: "Some Patterns of Technological Changes in High Performance Computers" in Proceedings of Supercomputing '88, pp/312-320. [473] "Xerox Data Systems SIGMA 9 Computer System" in P. H. Enslow, Jr. (Ed.): "Mul• tiprocessors and Parallel Processing," Wiley, 1974, pp/328-335. [474] L. Xiao-Ping and H. Amano: "A Static Scheduling System for a Parallel Machine (SM)2-II" in E. Odijk, M. Rem and J.-C. Syre (Eds.): "PARLE '89 - Parallel Ar• chitectures and Languages Europe," Springer-Verlag LNCS 366, 1989, pp/118-135. [475] Y. Yamaguchi et al.: "EM-3: A LISP Based Data Driven Machine" in Proceedings of the International Conference on Fifth Generation Computer Systems, 1984, pp/524-532. [476] T. Yuba: "Dataflow Computer Development in Japan" in Proceeding of the 1990 In• ternational Conference on Supercomputing, pp/140-147. [477] A. Zs6ter, T. Legendi and G. Balazs: "Design and Implementation of M1 Cellprocessor" in H. Burkhart (Ed.): CONPAR '90 - VAPP IV, Springer-Verlag LNCS 457, pp/692- 696. [478] G. Q. Zuo and A. Z. Wang: "MPS - an Experimental Multi-Microprocessor Based Parallel System" in H. Burkhart (Ed.): CONPAR '90 - VAPP IV, LNCS 457, Springer• Verlag, 1990, pp/347-354. Appendix

Information about the Systems

The systems included in survey and detailed in this Appendix are limited to those for which we were able to obtain data for all major classifications based on the literature available to us. In such a fast moving field, detailed information of the systems are continuously changing and new systems are continually being built. As we will be maintaining a (hopefully) complete and accurate database for general reference and possible future editions, we request your help. Please send information about the systems to authors at the following address:

Parallel Machine Survey c/o Prof. Larry Rudolph Institute of Computer Science The Hebrew University Givat Ram Campus Jerusalem 91904 Israel

The following pages give the relevant information about the systems in• cluded in this survey. The information was gathered from the cited references and the primary reference citation is in italics. The entries are alphabetically organized according to the primary name of the system. If you cannot locate a system, please check the index in case the system is sorted under a different name.

145 146 Appendix. Information About the Systems

AAP-l,AAP-2: (C - NTT Electrical Communications Laboratories) Applications: Image Processing. Control: SIMD. Number of PEs: 64K (256x256). Type of PE: I-bit ALU (8x8jchip). Interconnection Network: 8-nearest-neighbors in two paths: 1: 4-NN, 2: 8-NN. Memory Organization: Local, :2:8KbitjPE (AAP-2). Performance: 1000 MIPS. Host: Yes. Period: 1982 --:- AAP-l, 1986 - AAP-2. References: [231} [313J

AAPT (Adaptive Antenna Processor Testbed): (NL + C - RSRE + STC(UK)) Applications: Signal Processing. Control: Dataflow jSystolic Array. Number of PEs: 33: 21 for Triangular Wavefront Array Processor (WAP) + 12 for Data Correction WAP. Type of PE: TI TMS3201O, 200ns CPo Interconnection Network: 2-D mesh. Memory Organization: Local. Performance: 150 MIPS. Period: 1987 - prototype. References: [262}

ACE (8CE): (CR - IBM T. J. Watson Research Center) Applications: General. Control: MIMD. Number of PEs: ::;8. Type of PE: ROMP-C (IBM-PCjRT RISC Processor) + Rosetta-C MMU. Interconnection Network: Bus (Message Passing or Shared Memory), 32-bit, 80MBjs BW Inter-Processor Communication (IPC) bus. Memory Organization: Global, 16-128MB 1-8 boards x 16MB (No. of boards = 9 - No. of PEs + Local, 8MBjPE. Host: IBM PC-RT. Period: 1988 - Operational. References: [62} [408J Appendix. Information About the Systems 147

ACP: (NL - Fermilab) Applications: Particle Physics Simulations. Control: MIMD. Number of PEs: :::;144. Type of PE: MC68020 + AT&T32000 (Phase II), MIPS R2000 (Phase III). Interconnection Network: Hierarchical buses - 1 Branch bus + 8 Local buses. Memory Organization: Local. Period: 1988 - ~100 PEs (Phase II). References: [363]

AGM: (A + C - UC Irvine + Hyperstore Systems Inc.) Applications: General. Control: Dataflow. Number of PEs: 16. Type of PE: 2xMC68020 + FP Coprocessor (optional) + 256KB cache. Interconnection Network: 4-nearest-neighbor torus (leftmost column of processors are interface processors, connected to host). Memory Organization: Local, 2-8MBjPE + 200MB disk. Host: Yes, via leftmost column. Period: Operational by 1989. References: {49} [50}

AHR: (A - National U. of Mexico) Applications: General (LISP Processing). Control: Dataflow. N umber of PEs: Several dozens. Type of PE: Z-80. Interconnection Network: Bus. Memory Organization: Global, "Grill" - 512KW (32-bit) [8KW in version I], 55ns access time; Passive Memory - 1MW (22-bit) [64KW in Ver. I], 150ns access time; Variable Memory - 512KW (32-bit) [16KW in Ver. IJ, 150ns access time + local, 16KBjPE (RAM + ROM). Host: "Distributor": control distribution. Period: 1981 - 5-PE prototype (version I). References: [161] 148 Appendix. Information About the Systems

AIS-5000: (C - Applied Intelligent Systems Inc.) Applications: Image Processing (General). Control: SIMD. Number of PEs: :::;1024 (:::;8 cards x 16 Gate array chips/card x 8PEs/Gate array chip). Type of PE: Custom, bit-serial. Interconnection Network: Linear + p3 Bus among cards, 2 cycles/data transfer. Memory Organization: Local, 32Kbit/PE. Performance: 298MOPS (8-bit add)/1024 PEs. Host: Yes, MC68000-based, via VMEbus. Period: 1987 - Available commercially. References: [360]

ALAP: (CR - Hughes Aircraft) Applications: General. Control: SIMD. Number of PEs: 1 or more 112-PE wafers. Type of PE: Custom, I-bit, IMs CPo Interconnection Network: Linear + bus connection to control. Memory Organization: Local, 64-bit/PE. Host: NOVA 800. Period: 1974/75 - 112 word cells on a single wafer. References: [125]

ALICE: (A - Imperial College, London) Applications: Inference. Control: Data-Driven. Number of PEs: 16. Type of PE: Each PE: 5 Transputers. Interconnection Network: Delta Network, using 4x4 switches, :::; 150Mbit/s each. Packet addresses are passed. Memory Organization: Global, 26 Packet-Pool Segments (2MB each) + Local, 64KB/PE. Performance: 150K rewrites/s,128 Ms/Packet. Period: 1985 - prototype. References: [48] [89] [169] [174] [196] [407] [444] [449] Appendix. Information About the Systems 149

Alliant FX/Series (4 & 8, 40 & 80, 2800): (C - Alliant) Applications: General. Control: Multiple Vector Processors. N umber of PEs: In FX/x & FX/xO: 8 Computational Elements (CEs) + 12 Instruction Processors (IPs). In FX/2800: -5:.7 Processing Modules, each: 2 "Super CEs" (SCEs) & 2 "Super IPs" (SIPs). Type of PE: CE: Custom + Weitek 1064/65; IP: MC6S020, 170ns CP (FX/S); BIT2110/2120, 45ns CP (FX/SO); i860 in SCE & SIP, 25ns CP (FX/2800). Interconnection Network: Multiple buses: 2 for Data & addresses (64+2S-bit each), 1 for control. Memory Organization: Global, -5:.256MB + caches (separate for CEs & IPs). Performance: Peak: 11.SMFLOPS/PE (FX/4,S); 23.6MFLOPS/PE (FX/40,80); 40MFLOPS/SCE & 24MIPS/SIP (FX/2S00). Tested: 1.6, 2.3MFLOPS/PE (FX/8 & 80, resp.) on L100; 27MFLOPS (FX/8) on L300. Period: 1985 - FX/4, /S; 19S5 - FX/40, ISO; 1990 - FX/2S00; over 30 installed by 9/1990. References: [8] (1071 [117] [145] [194] [20S] [215] [259] [292] [375] (4341 [443]

Amdahl 5995 "A" or "M": (C - Amdahl) Applications: General. Control: MIMD. N umber of PEs: "A" models: 1-4; "M" models: 3-S. Type of PE: Custom, IOns CP on "A," 7ns CP on "M." Interconnection Network: Via shared memory. Memory Organization: Global, -5:.1GB on "A," -5:.2GB on "M" + Extended Memory: -5:.2GB on "A," -5:.SGB on "M." Performance: ~30MIPS/PE ("A"); ~50MIPS/PE ("M"). Period: Announced - 9/1990. References: [362] 150 Appendix. Information About the Systems

Ametek Series 2010: (C - Ametek) Applications: General. Control: MIMD / Multiple Vector Processors. N umber of PEs: 4-1024. Type of PE: MC68020, 40ns CP + 68881 or 68882 or Weitek vector processing board via VMEbus. Interconnection Network: 2-D Grid among custom Mesh Routing Chips, 8-bit links + remote PE point-to-point (Wormhole) connection, ~20MB/s per link, message passing. Memory Organization: Local, ::;8MB/PE, 50MB/s BW + ::;10MB on vector board. Performance: 4MIPS/PE, 420KFLOPS/PE (MC68881) or 630KFLOPS/PE (MC68882) or 20MFLOPS/PE (vector processing board). Host: Sun-3. Period: 1988 - Announced. References: [ll} [261] [347}

Ametek System/14: (C - Ametek) Applications: General. Control: MIMD. Number of PEs: 16-256. Type of PE: i80286/i80287, 125ns CP + i80186-based communication processor. Interconnection Network: Hypercube, Maximal BW: 3Mbit/s. Latency: 350JLs. Transmission Time: 9.53JLs/byte. Memory Organization: Local, IMByte DPR. Performance: 12-15MFLOPS (fully configured), ~IMIPS/PE. Host: VAX/UNIX. Period: 1985 - available commercially. References: [215] [259] [389]

AMP-I: (A - U. of Illinois) Applications: General. Control: MIMD. Number of PEs: 8. Type of PE: MC6800, IJLs CPo Interconnection Network: Read and write buses, 8-bit wide, 125ns cycle (8 bus windows per CPU CP). Memory Organization: Global, 64 MMs x lKB, using Intersil 6518 chips, with shared variables. Host: DEC System 10 host/controller, via BBX interface. Period: 1978/79 - prototype. References: [96} Appendix. Information About the Systems 151

ANMA: (NL - Electrotechnical Laboratory) Applications: General. Control: SIMD. Number of PEs: 16. Type of PE: Custom, 8-bit, based on 2xAm2901, which can be chained to give longer words. Interconnection Network: Logarithmically Structured Transfer (LST): PEs i & j are connected if j = i ± 2k (modN), for k = 0,1,2, ... ; 900ns cycle, BW: 16x1.1MB/s. Memory Organization: Local,4KB/PE. Host: VAX/VMS or 8080A + 64KB RAM, connected via a DPR to a 16-bit custom controller. Period: 1976 - prototype. References: [3i4}

APIOOO: (CR - Fujitsu Laboratories) Applications: General. Control: MIMD/SPMD. Number of PEs: 16-1024. Type of PE: SPARC IU + FPU, 40ns CP + 128KB Cache + Message Controller. Interconnection Network: 3 Networks: Torus (T-net) accessible via Routing Controllers, 16-bit, 25MB/s each channel, 50ns delay at nodes; Broadcast (B-net) bus and Synchronization (S-net) ring. Memory Organization: Local, 16MB DRAM/PE. Performance: Peak: 0.53-8.53GFLOPS (SP); 0.36-5.69 (DP) using 64-1024 PEs; tested: 2:300MFLOPS/256 PEs on 1000x1000 LU-decomposition, 446MFLOPS/256 PEs, 61OMFLOPS/512 PEs on L1000. Host: Sun 4/330 via VMEbus. Period: By 1990 - 256 PEs; by 1992 - 512 PEs; 13 machines installed by 9/1992. References: [lB'l} [390} [39i} 152 Appendix. Information About the Systems

Apollo Series 10000: (C - Hewlett-Packard Apollo Division) Applications: General (Graphics). Control: MIMD. Number of PEs: 4 CPUs + Graphics Engine: 5x8-bit ALUs, linear & quadratic interpolation units. Type of PE: CPU: "Prism" RISC processor, 32-bit integer and 64-bit FP + 128KB instruction cache & 64KB data cache, 55ns CPo Interconnection Network: "X-Bus," 64-bit wide, 160MB/s. Memory Organization: Global, 700MB + 42MB VRAM + 24MB DRAM in Graphics Engine, on crosspoint network, lOOns cycle. Performance: Peak: 140MFLOPS. Tested: 6MFLOPS/PE on LIOO (DP); 9MFLOPS/PE on coded BLAS (SP). Period: 3/1988 - Available Commercially. References: [91} [225}

Ap2S: (CR - Lockheed Palo Alto Research Lab.) Applications: Image Processing. Control: SIMD + MIMD Controller. Number of PEs: ~lO,OOO. Type of PE: GAPP PEs (originally) or Custom PEs (current). Interconnection Network: 4-nearest-neighbor mesh. Memory Organization: Local. Host: Yes + Distributed Macro Controller. Period: Constructed by 1990.

APS Family: (A - Bulgarian Academy of Science) Applications: General, Graphics. Control: MIMD. Number of PEs: 1-16 in Personal Computer class; 16-96 in Workstation class; 64-256 in Mini-supercomputer class; 512-1024 in Supercomputer class + File subsystem, Visualization subsystem, DSP subsystem, LAN subsystem, special-purpose I/O modules. Type of PE: Transputers: T800 & T414, used also in subsystems. Interconnection Network: Reconfigurable, based on dual crossbar in PC, WS classes. Memory Organization: Local: 1MB/PE in PC class, 2MB/PE in WS class. Host: Yes + Controller. Period: PC, WS class models built by 1990. References: [66} Appendix. Information About the Systems 153

APx (x = No. of PEs): (C - Visionary Systems) Applications: General. Control: SIMD. Number of PEs: 64-256 (1-4 boards). Type of PE: Custom RISC 16-bit, every 2 can form a 32-bit PE. Interconnection Network: 4-nearest-neighbor mesh + bus. Memory Organization: Shared, 8-32MB on AP256, available at 800MB/s + Local, 512B, available at 3200MB/s. Performance: 40-160MFLOPS (32-bit), 800-3200 MIPS. Host: Yes - workstation or PC. Period: Built by 1988. References: [2}

ARES: (A - Keio U., Japan) Applications: Pattern Recognition/Associative Search. Control: Multiple SIMD. Number of PEs: 64 (8 clusters: 8 cells + Multiple Response Resolver each). Type of PE: TI SN745481. Interconnection Network: Bus. Memory Organization: Local, 16KB /PE. Host: Yes + Master Control. Period: 1978/79 - operational. References: [5} [199}

Armstrong I, II: (A - Brown U.) Applications: General. Control: MIMD. N umber of PEs: ~75. Type of PE: Armstrong I: MC68010, lOOns CP; Armstrong II: MC68030/MC68882, 30ns CP Execution Processor + MC68020 (30ns CP) & Xilinx FPGA in Communication Processor. Interconnection Network: Reconfigurable, based on 8 network ports/PE and 5MB/s serial links, message passing. Memory Organization: Local, 512KB/PE in Armstrong I, 4-16MB/PE in Armstrong II. Period: 1986 - Armstrong Ij 1991 - Armstrong II PE prototype. References: [25} 154 Appendix. Information About the Systems

ASAP: (C - Martin Marietta) Applications: Signal Processing. Control: SIMD, Multiple-SIMD, MIMD. Number of PEs: Multiples of 256-PE Unit Array (16 chips/unit array x 16 PEs/chip). Type of PE: Custom, bit-serial, 50ns CP + FPU. Interconnection Network: 4-nearest-neighbor mesh + broadcast bus on columns + crosspoint from I/O to RAM and from RAM to PEs (160MB/s BW) + bus (VME/Multibus). Memory Organization: Local, 1200 bit/PE on-chip + off-chip. Performance: 91MFLOPS/Unit Array (32-bit addition). Host: AMD 2910-based ASAP controller. Period: 1986 - first chip; 1987 - chip redesign (ASAP-II); 1024-PE prototypes built. References: [166}

ASP, WASP 1, 2a, 2b ([Wafer-scale-] Associative String Processor): (A + C - BruneI U. + Aspex Microsystems) Applications: General (Image Processing). Control: Multiple SIMD. Number of PEs: ::;64K APEs: ::;8 MPP (Massively Parallel Processors) x several PPs (Parallel Processors, of various forms: with/without local memory, local control) x several ASP substrings, x many Associative PEs (APEs); in ASP: one or more chips of 256 APEs; in WASP 1: 1 ASP x 4 APPs x 36 APEs; in WASP 2a: 6 ASPs x 4 APPs x 36 APEs; in WASP 2b: 15 ASPs x 12 APPs x 36 APEs. Type of PE: Limited function (match, add, read, write) PEs, with bit-serial 8-bit additions. Interconnection Network: Reconfigurable: shift register in APP, full connection in ASP on WASP 1, linear in ASP on WASP 2; linear among ASPs. Memory Organization: Global + Local, 32-bit/PE on 256-PE SCAPE chip. Performance: ::;40GOPS/64K APEs, 25ns CP on 12-bit add. Period: 1981 - test chips (by Plessey); 1986 - SCAPE chips (Plessey); 3Q1988 - WASP 1; 4Q1989 - WASP 2a; 1Q1990 - WASP 2b. References: [205} [206} [246} [247} [248} [249} [250} Appendix. Information About the Systems 155

ATOMS: (CR - AT&T Bell Labs.) Applications: -Behavior Studies. Control: SIMD. Number of PEs: ~16. Type of PE: AMD29117-based controller + 4 Weitek 1164/1165 chip sets. Interconnection Network: VME bus. Memory Organization: Shared (Memory access by turns). Period: 1987/88 - 2 prototypes. References: [31} [364]

BBN Butterfly, GPIOOO, TC2000 models: (C - BBN Advanced Computers) Applications: General (Signal Processing). Control: MIMD. Number of PEs: 2-256 on GPlOOO; ~512 on TC2000. Type of PE: Butterfly: MC68000, 125ns CP; GPI000: MC68020 + MC68881 FPA, 62.5ns CP (rest of PE - Memory Management, I/O bus, Switch interface at 125ns clock); TC2000: MC88000 RISC, 50ns CPo Interconnection Network: Butterfly switch to memory, packet switching; in GPI000: 4 stages of 4x4 crossbars, 4-bit-wide channels, 125ns cycle, latency: 1.8p,s, BW: 32Mbit/s per path; in TC2000: 8x8 crossbars, 8-bit-wide channels, 26.3ns cycle, BW: 38MB/s per channel. Memory Organization: Distributed, in GPlOOO: 1-4MB/PE, local reference: 2p,s, remote reference: 6p,s; TC2000: 16/32MB per PE (using IMbit/4Mbit chips) on FP boards, 4/16MB per PE on FPV board (with VMEbus connection). Performance: Peak: Butterfly - 0.5MIPS/PE; GPlOOO - 2.5MIPS/PE; TC2000 - 19 VAX MIPS/PE, 20MFLOPS/PE (32-bit); measured, GPlOOO: 125MFLOPS. Host: Some UNIX machine or Symbolics W IS. Period: 1983 - available commercially; 1987 - GPlOOO; 7/1989 - TC2000 announced. References: [7] [8] [29} [39} [48] [145] [194] [196] [208] [215] [245} [259] [292] [376] [443] [444] 156 Appendix. Information About the Systems

BBN Monarch: (C - BBN Advanced Computers) Applications: Engineering & Scientific. Control: MIMD. Number of PEs: ~64K. Type of PE: Custom, 64-bit, each containing 2 ALUs, 1 mult/div unit, 1 add/subtract unit, & I/O unit. Interconnection Network: Multi-Staged Network, circuit & packet switched, 2-bit-wide paths, 5.7ns cycle. Memory Organization: Global, IMB/PE (in 2MB modules) available at 8MB/s per PE. Performance: 6 MIPS/PE, 2 MFLOPS/PE. Period: 1990 - implementation.

B-HIVE: (A - North Carolina State U.) Applications: General. Control: MIMD. Number of PEs: 24. Type of PE: NS32016 + FPU + Communication Processor. Interconnection Network: Alpha Structure - a generalized hypercube, with more than two PEs in each direction (2x3x4 matrix), message passing. Memory Organization: Distributed shared (global virtual memory), with DPR between Application Processor and Communication Processor. Period: 1986/87 - prototype. References: [170}

BiiN 20, 60: (C - BiiN (An Intel/ cooperation)) Applications: General. Control: Object Oriented. Number of PEs: 1-2 (20), 2-8 (60), 2/board, ~12 boards for CPUs, Memory & I/O. Type of PE: Intel80960XA RlSC Processor, 62.5ns CP, + cache: 128KB (20), 256KB (60). Interconnection Network: Bus(es) (1 in 20, 2 in 60), 32-bit wide, 40MB/s per bus BW (for messages and memory access). Memory Organization: Shared, 20: ~32MB; 60: ~80MB (8 or 16MB/module + Local, 20: 128KB; 60: 256KB. Performance: 20: 4.5-9 MIPS; 60: 11-40 MIPS. Period: 7/1989 - first products shipped; 10/1989 - firm closed. References: [326} [327} Appendix. Information About tbe Systems 157

BLITZEN: (A - Microelectronics Center of North Carolina, Duke U., North Carolina State U.) Applications: Signal Processing & General. Control: SIMD + local conditional control, local modification of RAM addressing. Number of PEs: :<:;16K (128 chips x 128 PEs/chip). Type of PE: Custom, bit-serial, full adder, :<:;50ns CPo Interconnection Network: 8-nearest-neighbor mesh ("X-Grid") + I/O bus (1 per 16 PEs). Memory Organization: Local, lKbit/PE. Performance: 2450GFLOPS/16K PEs (32-bit); 21GIPS (32-bit fixed). Host: Yes. Period: 1989 - chips built. References: [58}

BSP: (C - Burroughs Corp.) Applications: General. Control: SIMD. Number of PEs: 16 Arithmetic Elements (AEs) + Scalar Processor. Type of PE: AE: custom, pipelined, 160ns CP; SP: 83ns CPo Interconnection Network: Multistaged ("Alignment") network, circuit switching. Memory Organization: Global: Parallel Primary: 17 MM (512KB each, 160ns cycle, BW: lOOMW Is) + instruction (control) memory - 256KW (48-bit) + Secondary: File Memory (CCD technology), 4-64MW, 75MB/s. Performance: Measured: 80-100MFLOPS. Host: B7800. Period: 1979 - prototype built; project suspended. References: [123] [195] [417} [418] 158 Appendix. Information About the Systems

B-SYS (Brown Systolic Array): (A - Brown U.) Applications: Pattern Recognition (Built for the Human Genotype Project). Control: SIMD (programmable systolic array). Number of PEs: ~1504 (32 chips x 47 PEs/chip). Type of PE: Custom, 8-bit data + 38-bit instruction, 20ns CP, 16 CP /instruction. Interconnection Network: Linear Array (via shared registers). Memory Organization: Local + 2-way shared register banks (16x8-bit). Performance: Tested: 108 MOPS/470 PEs (8-bit); peak: 3.3 MOPS/PE. Host: Intel80386-based. Period: Fall 1990 - 470-PE prototype. References: [193}

BTl 8000: (C - BTl Computer Systems) Applications: General. Control: MIMD. Number of PEs: ~13 (usually ~6). Type of PE: Custom microprocessor, 32-bit. Interconnection Network: Bus, 60MB/s, with 16 resources' slots on it: 1 - System Unit, and at least one each of: CPU, MM, Peripherals Unit. Memory Organization: Global, ~13 MMs, 128KB each (~512KB per user). Period: 1979 - available commercially. References: [257}

Bull nnc: (CR - Bull SA (France)) Applications: Database Management. Control: Reduction. Number of PEs: ~256 (~16 clusters of 8-16 PEs). Type of PE: MC68020 + custom symbolic coprocessor. Interconnection Network: Hierarchical: Cluster (local) + Global. Type of local depends on cluster use: barrel shifter for disk-based relations; wide bus for memory-based relations. Message passing. Memory Organization: Local. Performance: 100MIPS. Period: 1987 - Prototype. References: [153} [443] Appendix. Information About the Systems 159

Burroughs B7700 (B6700) Series: (C - Burroughs) Applications: General. Control: Multiple Vector Processors. Number of PEs: 1-4 (1-3 in B6700). Type of PE: Custom, 62.5ns CP (200-400ns CP on B6700) + Vector Processor + 32W local program buffer. Interconnection Network: Crosspoint (8 requesters, 8 MMs maximum). Memory Organization: Global, :SIMW: :S8 MMs, :S1.5MB/MM (IW = 48-bit data + 3-bit control + I-bit parity). 88ns read cycle. Period: 1969 - Announced. References: [112] (356, chapter 6] [399, pp/99-104]

BVM: (A - Duke U. & U. of North Carolina, Chapel Hill) Applications: General. Control: SIMD. Number of PEs: :SIM. Type of PE: Bit-serial, 250ns CPo Interconnection Network: Cube-Connected Cycles. Memory Organization: Local + access to memory of 3 neighbors at anyone time. Performance: :::;1.2GFLOPS/IM PEs on matrix computation. Host: Ikonas Graphics Buffer. Period: 1983 - 1st prototype; 3-4 versions with :::;8 chips x 64 PEs built until 1985. References: [457]

CAP: (CR - Fujitsu Laboratories) Applications: Graphics (+ General). Control: MIMD. Number of PEs: 64 (8x8) - C3 model; 256 - C5 model. Type of PE: i80186 + i8087 + CAP VLSI chip for window control and global communication. Interconnection Network: Reconfigurable on a 4-nearest-neighbor mesh (message passing) + Command bus + Video Bus. Memory Organization: Local, 2MB RAM + 64KB ROM + 96KW (24-bit) video RAM per PE. Host: Apollo Domain DN460 or similar. Period: 1986 - C3 1988 - C5 (prototypes). References: (203] (354] 160 Appendix. Information About the Systems

Cedar: (A - U. of Illinois at Urbana) Applications: General. Control: Cluster - Multiple Vector Processors; Global - "Macro Dataflow." Number of PEs: ~1024. Type of PE: Planned: Custom - CPU + FPA + Address generator + Processor control; Implemented using Alliant FX/8 as Clusters, 85ns CPo Interconnection Network: Hierarchical: crossbar within cluster; n network, packet switching and made up of 2 stages of 8x8 crossbars between clusters and memory, 80-bit wide, 85ns cycle. Memory Organization: Hierarchical: Global (8MB/PE, 1.51-£s latency, 1W /170ns BW); Cluster memory (32MB, Il-£s access time), cluster cache (128KB, 85ns access), PE memory (16KW, 85ns access rate), PE cache (50ns access rate). Performance: Peak: 11.8MFLOPS/PE on current prototype, measured: 40MFLOPS per cluster. Period: 1986 - current prototype: 4 clusters, each an Alliant FX/8 (8 PEs). References: [7] [8] [143} [144] [182] [183] [194] [232}

CESAR: (NL - Norwegian Defense Research Establishment) Applications: Image Processing (+ General). Control: SIMD (Systolic). Number of PEs: ~4 MALUs, 128 (8xI6) PEs each. Type of PE: Custom, 32-bit (bit serial) with FP capability; 50ns CPo Interconnection Network: 8-nearest-neighbor mesh (Cylinder), 4 input and 4 output; I-bit data paths, 50ns cycle. Memory Organization: Local + 1 buffer memory/MALU. Performance: Peak: 320MFLOPS (4 MALUs); sustained: ~280MFLOPS (32-bit). Host: Yes, e.g., ND-lOO + 68000-based control Processor. Period: 1988 - prototype. References: {439}

Cetia 1000: (C - Cetia (Thomson-CSF) ) Applications: General, Graphics. Control: MIMD. N umber of PEs: ~4. Type of PE: MC88100, 40ns CPo Interconnection Network: VMEbus. Memory Organization: Local, 32MB /PE. Period: 1991/92 - available commercially. References: [340] Appendix. Information About the Systems 161

CHoPP: (A - Columbia U.) Applications: General. Control: Multiple VLIW /Vector Processors. Number of PEs: :=:;16. Type of PE: Each PE: 4 Computational Functional Units + 4 Address Functional Units + Branching Unit, 256-bit instructions. Interconnection Network: Bus (eventually - "conflict-free network"). Memory Organization: Distributed, with full/empty bit on every word. Performance: 4-PE system is slightly to much faster than a Cray X-MP /1 on Livermore Loops. Period: 1985 - CHoPP 1 prototype. References: [7] [8]

CLIP 4: (A + C - U. College, London + Stonefield Systems) Applications: Image Processing. Control: SIMD. Number of PEs: 96x96, 8 PEs/chip. Type of PE: Custom, I-bit. Interconnection Network: 8-nearest-neighbor. Memory Organization: Local, 32-bit/PE. Host: PDP-ll/34. Period: 1979 - first prototype; 1980 - first chip-based prototype; 1985 - available commercially. References: [132} [208] [345]

CLIP 7: (A - U. College, London) Applications: Image Processing. Control: SIMD. Number of PEs: 512x4, 8 PEs/chip. Type of PE: Custom, 16-bit. Interconnection Network: Reconfigurable, linear to 8-NN. Memory Organization: 4KB/PE == 256-bit/pixel, 128 pixels/PE (for processing 512x512 pixel images). Host: Minicomputer + UNIX. Period: 1986 - CLIP 7 A prototype: 256 PEs, linear, 64KB/PE. References: [133} [345] 162 Appendix. Information About the Systems

C-LISP Machine: (A - Kyoto U.) Applications: LISP. Control: MIMD. Number of PEs: 1 Master Processor (MP) + 8 Interpretation Processors (IPs). Type of PE: MC68000, 16-bit, 125ns CPo Interconnection Network: Multiple buses + star-connected interrupt linesj message passing. Memory Organization: Global, 8MB, in three areas: List Area, 1Mcell (x48 bits: 20 - CAR, 20 - CDR, 8 - attributes)j PCB and Random Access Area & Control Stack Area - each with its own bus. Period: 1983/84 - prototype. References: [424} [449]

Cm*: (A - CMU) Applications: General. Control: MIMD. Number of PEs: 50 Cm's (5 clusters of 10). Type of PE: DEC LSI-ll (= Cm). Interconnection Network: 3-bus hierarchy: Cm, intra- and intercluster. Shared variables & message passing (packet switching). Memory Organization: Distributed, ~256MB/Cm. Access cost ratio: 1:3:9 (Iocal:cluster:out-of-cluster). Host: PDP-ll/40. Period: 1980 - operational. References: [7] [8] [123] [146} [182] [183]

C.mmp: (A - CMU) Applications: General. Control: MIMD /SIMD /MISD. Number of PEs: 16. Type of PE: PDP-11/20 or 40, 16-bit. Interconnection Network: Crossbar (circuit switched) + bus. Shared variables and message passing. Memory Organization: Global, 32MB address space, 2.7MB physical at 16 ports: 11 - core, 250ns access, 650ns cycle, 8 Memory modules of 16KBj l.71M references/s per port. 5 - MOS, 330ns access, 450ns cycle, 4 memory modules of 64KBj 1.49M references/s + Local, 8KB /PE. Performance: PDP-11/40: 4.3MIPSj PDP-11/20: 3.0MIPS. Period: 1975 - operational. References: [7] [8] [112] [123] [182] [183] [195] Appendix. Information About the Systems 163

Columbia QCD Machine: (A - Columbia U.) Applications: Quantum Chromodynamics. Control: MIMD or SPMD. N umber of PEs: 256. Type of PE: i80286/80287 + TRW vector processing chips (TDC 1022 ALU and MPY16HJ Multiply). Interconnection Network: Torus, PE-to-PE rate: 2MB/s after 600ns, to any node. Memory Organization: Local, 128KB/PE each memory node accessible to 3 PEs in SIMD operation. 45ns access time. Performance: Peak: 16MFLOPS/PE; sustained: 160MFLOPS on 16 PEs. Host: VAX-ll/780. Period: 1984 - 16-PE prototype. References:[77} [183] {433}

Computing Surface: (C - Meiko) Applications: General. Control: MIMD. N umber of PEs: 1000 and more (potentially; installed systems: ~~500). Type of PE: T800 Transputer: 4 in each PE card. Weitek chips Floating Point Cards also available. Interconnection Network: Reconfigurable in a 2-D mesh. Memory Organization: Local: 256KB-48MB/PE (~8MB on 4-PE board, 16MB on 2-PE board, 48MB on 1-PE board); 1.7GB on a 500-PE system at Edinburgh U. Performance: 360MFLOPS for a 311xT800 system. Host: VAX/VMS, PC-AT or MC68000-based. Period: 1985 - available commercially. References: [175} [259] [283} [443]

Concert: (A + C - MIT + Harris Corp.) Applications: General. Control: MIMD. Number of PEs: 32-64 (8 clusters of 4-8 PEs). Type of PE: MC68000, 125ns CPo Interconnection Network: Ring Bus + Arbiter between clusters, PE-to-PE within cluster - multibus, PE-to-memory - high-speed bus. Harris version is to have a different global communication mechanism. Memory Organization: Distributed - cluster memory and PE memory. Period: 1984 - 3-PE prototype; 1987 - 7 clusters of 4 PEs. References: [7] [162} [163} 164 Appendix. Information About the Systems

Concurrent Computer 32xO (x = 2,6,8): (C - Concurrent Computers) Applications: General. Control: MIMD. N umber of PEs: 1--6. Type of PE: Custom. Interconnection Network: Bus, 64MB/s. Memory Organization: Global. Performance: 34MIPS. Period: Available since mid-1980s. References: [208] [412]

Connection Machine-I, 2, 200: (C - Thinking Machines Inc.) Applications: General. Control: SIMD. Number of PEs: 64K + (optional) 2K FPAs (32 or 64 bit). Type of PE: Custom, I-bit ALU, (250ns CP, CM-l), (143ns CP, CM-2), (100 ns CP, CM-200) 16 PEs per chip, Floating Point ALU (32-bit CM-l,2; 32 or 64-bit CM-200). Interconnection Network: 12-cube between 16-PE nodes flip network (butterfly); bit-serial, BW: ~6GB/s (on FFT); + grid, BW: 20Gbit/s. Memory Organization: Local, 4Kbit/PE, accessible at 20Gbit/s (CM-l); 64Kbit/PE (CM-2), 125ns access time; 256Kbit/PE, 100 ns (CM-200). Performance: CM-l: >1000MIPS on 32-bit additions; CM-2: peak- 2.5GIPS (32-bit,) 4GIPS (8-bit), sustained: 20GFLOPS (with FPAs); CM-200: 40 GFLOPS (32 bit-FPA), 20 GFLOPS (64-bit FPA). Host: Symbolics 3600, VAX, or Sun. Period: 1985 - CM-l; 1986 - CM-2 (available commercially), 1991 - CM-200. References: [7] [8] [48] [179} [194] [196] [259] [261] [292] [345] [377] [444] {448} Appendix. Information About the Systems 165

Connection Machine-5: (C - Thinking Machines Inc.) Applications: General. Control: MIMD /SIMD. Number of PEs: ~16K supported on current version, ~256K eventually. Type of PE: SPARC, 30ns CP + FPU, MMU, 64 KB cache + (opt.) 4 Vector Units between PE and memory. Interconnection Network: Dual "Hypertree" (incomplete "fat tree" - a binary tree with wider links at higher levels): one for requests, one for replies, packet passing (~24B/packet); local communication BW: 20MB/s; random communication BW: 5MB/s; + Diagnostic NW + Control NW for synchronization. Memory Organization: Local, ~32MB/PE, DRAM (4x8MB if vector units installed). Performance: Peak: 32MOPs (integer or FP), 64-bit/Vector Unit (128MFLOPS or 128MOPS/PE). Period: 1992 - announced, ~lK PEs (with Vector Units). References: [42] [156]

Convex 2xO Series (x = No. of PEs): (C - Convex) Applications: General. Control: Multiple Vector Processors. N umber of PEs: 1-4. Type of PE: Custom "C-2." multiple functional units, 40ns CPo Interconnection Network: Crossbar switch. Memory Organization: Global, ~2GB, BW - 200MB/s per memory - PE port. Performance: Peak: 50MFLOPS/PE, measured: lOMFLOPS/PE on L100, 126MFLOPS/4PEs on L300. Period: Available since 1988. References: [75] [117] [269]

CORAL: (A - Tokushima U. (Japan)) Applications: General. Control: MIMD. Number of PEs: ~1023. Type of PE: i8085 on first prototype; MC68000 on second prototype, lOOns CPo Interconnection Network: Binary tree, 2MB/s channels. Memory Organization: Local, 512KB/PE. Performance: 40MIPS, 0.3MFLOPS. Host: Yes, Sharp IX-5 (MC68000-based) Unix system, attached to root (on 68K version). Period: 1983 - 15-PE prototype; 1984 - 63-PE prototype; 1987 - MC68000 prototype. References: [301] [313] [429] 166 Appendix. Information About the Systems

Cosmic Cube (Caltech Concurrent Computation Program) : (A• Caltech) Applications: General. Control: Multiple Vector Processors. Number of PEs: 2P (64 in Mk I, 128 in Mk II, III). Type of PE: i8086/8087, 200ns CP in Mk I & II; MC68020 + MC68881 + MC68882 + Weitek XL WTL8000 chips + MC68020-based communication processor on Mk IIIfp. Interconnection Network: Hypercube, with 4-bit-wide data paths, message passing using 64-bit packets at 4Mbit/s on a direct link. Memory Organization: Local, 128KB/PE (Mk I), 256KB/PE (Mk II), 4MB /PE (Mk IIIfp). Performance: 50KFLOPS/PE. Host: Intermediate host such as Sun or VAX-ll/780 + node-processor-based host. Period: 1982 - 4-PE prototype; 1983 - 64-PE prototype; 1984 - Mk II; 1987/88 - Mk IIfp. References: [7] [73} [134} [182] [183] [361} [442] {447}

Cray X-MP lab, Cray Y-MPlab (a = No. of PEs, b = memory size in MW), Cray YMP C90: (C - Cray Research) Applications: General. Control: Multiple Vector Processors. Number of PEs: 1-4 (X-MP), 1-8 (Y-MP), 16 (C90). Type of PE: Cray-1-like: 9.5ns, 8.5ns, 6ns CP on X-MP (1982), X-MP (1986), Y-MP, respectively; 13 functional units. Interconnection Network: Via shared registers & memory - each bank can be accessed in parallel during each cycle. Memory Organization: Global, :::;64MW (64-bit) on X-MP, :::;128MW on Y-MP 3 words/cycle, 70ns, 25ns cycle time for X-MP /1, X-MP /2,4 resp. Latency on X-MP = 38ns. Performance: X-MP peak: 21OMFLOPS/CPU, measured: 39MFLOPS/CPU on L100, 480MFLOPS/4 CPU on L300, 560MFLOPS/4 CPU on PDEs, Y-MP /832 peak: 4GIPS, 3.6GFLOPS. Period: 1982 - X-MP /2;, 1984 - X-MP /48; 1987/88 - Y-MP. References: [8] [29] [32] [144] [182] [183] [195] [261][291} [335} Appendix. Information About the Systems 167

Cray-2, Cray-3: (C - Cray Research) Applications: General. Control: Multiple Vector Processors. Number of PEs: 2-4 (Cray-2), 1-16 (Cray-3). Type of PE: Extended Cray-1: 4.1ns CP for Cray-2, 2ns CP for Cray-3 (the GaAs Version). Interconnection Network: Via a coordination processor and multiported, concurrent access memory. Memory Organization: Global, ::;256MW (64-bit), in 128 banks, 60ns cycle, 1GW/s BW. (Each processor has a 3-parallel access to memory for 2 operand & 1 result streams + Local, 16KW). Performance: Cray-2 peak: 122MIPS, 488MFLOPS/CPU, measured: 18MFLOPS/CPU on LIOO, 93MFLOPS/CPU on L300. Cray-3 peak: 1GFLOPS/CPU. Period: 1985 - first Cray-2 delivered. References: [42] [59; [182] [183] [194] [443]

CROSS 8: (A - Norwegian Institute of Technology) Applications: Database. Control: MIMD. Number of PEs: 1 Master (host) + 8 Slaves. Type of PE: i80186, 125ns CPo Interconnection Network: Crosspoint with 1KB DPR at intersections, 8-bit-wide data paths; cycle - 500ns, 4 cycles/transfer; BW: 4MB/s. Memory Organization: Local, 256KB/PE + 21MB disk/PE. Performance: ~lMIPS/PE. Host: IBM-PC/XT or compatible. Period: March 1987 - first run. References: [67]

Culler PSC: (C - Culler) Applications: General. Control: Multiple Vector Processors. Number of PEs: 1-2 User Processors + 1 Kernel Processor. Type of PE: User Processor: Custom, VLIW (128-bit) processor, 200ns CP 2 units: "A" machine for instructions, addressing, and memory accesses & "X" machine (several FUs) for mathematical processing; Kernel Processor: Sun or similar (host). Interconnection Network: Multiple buses, both in the PE (two 128-bit buses, each at 25MB/s) and between PEs. Memory Organization: Global: ::;96MB + Local. Performance: 18MIPS (scalar), llMFLOPS (vector) peak. Host: Sun or similar. Period: 1985-1987 - available commercially. References: [292] 168 Appendix. Information About the Systems

CYBA-M: (A - UMIST) Applications: Image Processing. Control: MIMD. Number of PEs: 16. Type of PE: i8080 A-I. Interconnection Network: Via multiported memory. Memory Organization: Two banks of shared memory: Global Memory, 16KB, available at lOMB/s; Image Memory, 16KB, available at 2.5MB/s + local, 32KB/PE. Host: PDP-ll/34. Period: 1983 - prototype. References: [92} [337}

Cyber 875, Cyber 990: (C - CDC) Applications: General. Control: Multiple Vector Processors. Number of PEs: 2. Type of PE: Custom: 25ns CP on 875, 16ns CP on 990. Interconnection Network: Via multiported shared memory. Memory Organization: Global, ~8MB, 75ns cycle on 875, ~32MB, 64ns cycle on 990. Performance: 32MIPS on 875, 52MIPS on 990. Period: Available since around 1984. References: (32)

CyberPlus: (CR - CDC) Applications: General. Control: Multiple Vector Processors. Number of PEs: 256. Type of PE: Custom, VLIW, 15 functional units + 3 FP functional units, 20ns CP. Interconnection Network: Dual Rings among each 16 PEs (system, application) 20ns cycle, packet switching at 1 packet (16-bit data, 13-bit address) per cycle (Le., 25Gbit/s each ring). Memory Organization: Global, (per 64 PEs) accessed via dual memory rings: Processor (64-bit, 20ns cycle, 3.2Gbit/s) & Central (64-bit, 80ns cycle, 800Mbit/s) + Local, 256KW (64-bit) Floating Point + 16KW (16-bit) Integers + 4KW (240-bit) instructions. Performance: 650MIPS, 65MFLOPS/PE (64-bit), 103MFLOPS/PE (32-bit). Host: Cyber 800. Period: 1985 - available commercially. References: (182) (183) [190} (194) (208)[443) Appendix. Information About the Systems 169

Cydra 5 (MXCL-5): (C - Cydrome Inc.) Applications: General. Control: Dataflow. Number of PEs: 6 + Numeric Processor + 2 I/O Processors. Type of PE: MC68020 (Interactive Processor); custom FPU, 62.5ns CPo Interconnection Network: Bus, 100MB/s. Memory Organization: Global, 32-512MB, 4 MMs + support memory, 8-64MB. Performance: 25MFLOPS (64-bit), 50MFLOPS (32-bit) in FPU. 175MOPS, 15.4MFLOPS on L100, 5.8MFLOPS on Livermore Loops. Period: During 1988 - available commercially; 9/1988 -closed down. References: [332)

Cyto I: (C - Cyto Systems) Applications: Image Processing. Control: SIMD (Systolic). Number of PEs: 88. Type of PE: Custom, 8-bit. Interconnection Network: Linear. Memory Organization: Local. Performance: 140MOPS. Host: VAX-ll/780. Period: 1981 -available commercially. References: [415)

Cytocomputer I: (NL - Environmental Research Institute of Michigan (ERIM)) Applications: Image Processing. Control: SIMD (Systolic). Number of PEs: 2 pipelines: 1 - 88 "2-D" transformation stages; 2 - 25 "3-D" transformation stages. Type of PE: LUT based: 650ns CP 1-bit 2-D cells; 8-bit 3-D cells. Interconnection Network: Linear. Memory Organization: Local, 9 pixels/cell. Performance: 1.6M pixels/so Host: PDP-ll/45 connected to the controller's DMA. Period: 1979 - prototype. References: [260) [414) 170 Appendix. Information About the Systems

D825: (C - Burroughs) Applications: General (Command and Control). Control: MIMD. Number of PEs: ::;4. Type of PE: Custom. Interconnection Network: Crosspoint to memory, 62.5ns cycle. Memory Organization: Global, 16MMs x 4KW x 48-bit. Period: 1960 - first disclosure; 1962 - first installation. References: [13} [112J

DADO: (A - Columbia U.) Applications: Expert & Production Systems. Control: Partitionable SIMD/MIMD. Number of PEs: 2P - 1; p = 4 - 16 (1023 in DADO II). Type of PE: Intel 8751 (8-bit, 12MHz) + RAM chip + I/O chip. Interconnection Network: Binary tree, message passing. Memory Organization: Local, 20KB /PE, 2 x 8K in DADO II. Performance: ~0.5MIPS/PE (l.8f.ts average instruction time). Host: VAX-11/750 in DADO II. Period: 1983 - 15-PE DADO I; 1985 - 1023-PE DADO II. References: [7J [8J [48J [196J [251J [4 19} [420} [443J [444J

DAFIM: (A - Institute of Technical Cybernetics (Czechoslovakia)) Applications: Knowledge Processing, Inference. Control: Dataflow. Number of PEs: "Highly Parallel"; 3 PEs/cluster. Type of PE: Transputer-based. Interconnection Network: Via shared memory within cluster; hexagonal mesh among clusters. Memory Organization: Shared, 3 Structure Memory Modules/cluster, interspersed with PEs. Period: 1 cluster built by 1988. References: [57}

DAMP: (A - U. of Paderborn (Germany)) Applications: General. Control: MIMD. N umber of PEs: 8, 64. Type of PE: T800 Transputer. Interconnection Network: 8-nearest-neighbor torus, circuit switched, implemented using a local switching NW based on INMOS C004 crossbar switch + 8 link adaptors (INMOS COI2)/PE. Memory Organization: Local, ::;8MB/PE. Period: 8-PE system built by 1991, 64-PE system being implemented in 1991. References: [37} [38} Appendix. Information About the Systems 171

DAP: (C - ICL (AMT)) Applications: General. Control: SIMD. Number of PEs: 32 x 32 ("mini") to 64 x 64 (standard). Type of PE: Bit-serial, 150ns CPo Interconnection Network: Grid + Half Column/Row Hop + row/column bus for broadcast and controller communications (bit-serial paths). Memory Organization: Local, :::::;lMbit/PE. BW = 820MB/s, 150nscycle. Performance: 30MOPS on 1K PEs version (AMT DAP-500), 186MOPS (32-bit) on 4K version (measured). Peak: 60MFLOPS (64-bit); 15MFLOPS measured on 4K version; up to 10GIPS on Boolean operations. Host: VAX or Sun or ICL 2900. Period: 1976 - available commercially. References: [7] [8] (128] [194] [261] (336][345] [402] [443]

DASH: (A - Stanford U.) Applications: General. Control: MIMD. Number of PEs: :::::;64: :::::;16 (4x4) clusters x 4 PEs. Type of PE: Each cluster - Silicon Graphics POWER Station 4D/340: 4 PEs, each: MIPS R3000 + R3010 FP coprocessor + 64KB I-Cache + 64KB D-Cache, 30ns CP + 256KB second-level data cache, 62.5ns cycle. Interconnection Network: 2 x 2-D mesh (Request, Reply) among nodes + Wormhole Routing, 240MB/s BW, MPBUS within cluster, 32-bit data, 64-bit address, :::::;64MB/s BW. Memory Organization: Distributed, :::::;256MB, 1 MM/Cluster. Performance: Peak: 25 VAX MIPS/PE, :::::;lOMFLOPS/PE; tested: 4MFLOPS/PE on Linpack, 8.8MFLOPS/PE (DP) on Matrix Multiplication. Period: 1990 - 16-PE prototype - 4 (2x2) clusters; 1992 - 32-PE prototype. References: (254] (255] (256] 172 Appendix. Information About the Systems

Data Transport Computer: (C - Wave Tracer) Applications: Signal Processing. Control: SIMD. Number of PEs: 4K-16K. Type of PE: Custom, simple. Interconnection Network:3-D Array (::::; 32x32x16 nodes). Memory Organization: Local, 32KB /PE. Performance: 2:1GOPS. Period: 7/1990 - announced. References: [365)

DataWave: (C - ITT Intermetall (Germany)) Applications: Image Processing, Video Compression/Decompression. Control: Systolic/Dataflow. Number of PEs: 16/chip. Type of PE: Custom, Harvard Architecture RISC, 12-bit, 8ns CPo Interconnection Network: 4-nearest-neighbor mesh, 1125MB/s each link (on-chip), 750MB/s each link (off-chip) + bus switch at each edge for I/O. Memory Organization: Local: 16 registers/PE. Performance: Peak: 250MOPS/PE (half +, half x). Host: Sun SparcStation + DataWave Evaluation Board. Period: Implemented by 1991. References: {359} [400)

DATIS-P: (A - University of Saarbrucken) Applications: Numeric, Computer Graphics, APL. Control: MIMD. N umber of PEs: ::::;256. Type of PE: MC68020 + MC68881. Interconnection Network: Benes network, fault tolerant + control bus. Memory Organization: Local. Performance: O.IMFLOPS/PE. Host: Yes. Period: 1990 - 1 + 32 PEs. References: [319) Appendix. Information About the Systems 173

DDDP: (CR - OKI Electric Industry) Applications: General. Control: Dataflow. Number of PEs: 4 (in prototype). Type of PE: AMD2901, 320ns CPo Interconnection Network: Ring Bus, 160ns cycle, 5 cycles/transfer (single precision), 7 cycles/transfer( double precision). Memory Organization: Local, 30KB /PE. Performance: 0.7MIPS overall. Host: Yes. Period: 1982 - prototype. References: [8] [226}

DDP: (CR - Texas Instruments) Applications: General. Control: Dataflow. N umber of PEs: 4. Type of PE: Custom, 32-bit minicomputer, based on TI-990 components. Interconnection Network: Ring (Circular Shift Register). Memory Organization: Local, 32KW (x32-bit data + 4-bit tag). Host: TI-990/10 + 19xKB RAM. Period: 9/1978 - prototype operational. References: [8] [209} [445][449] [476]

Delft Parallel Processor (DPP81, DPP84}:(A - Delft Technical University) Applications: Numeric - Simulations of physical systems. Control: MIMD. N umber of PEs: 16. Type of PE: 8-bit AM9511a with an optional Weitek-based vector processor (DPP84). Interconnection Network: Full Interconnection. Optical backplane is under construction. Memory Organization: Local. Performance: 20 KFLOPS/PE; 20 MFLOPS/vector processor. Host: Sun-3. Period: 1981 - first implementation (DPP81); 1984 - second implementation (DPP84); 1990 - 8 vector-processing units were added. References: [98} 174 Appendix. Information About the Systems

Denelcor HEP-l: (C - Denelcor) Applications: General. Control: Multiple Vector Processors. Number of PEs: ::;16. Type of PE: Custom, lOOns CP, pipelined for data and instructions (up to 50 processes supported in each PE, with at most one instruction per process in the pipeline at any time). Interconnection Network: Multistaged, pipelined, packet-switching network, lOMB/s on each path. Memory Organization: Global, ::;lGB, 50ns cycle (::;128 modules). Performance: ~lOMIPS/PE (for 8 independent processes per PE- 1.25MIPS per process). Period: 1981-1985 - available commercially (largest built - 4 PE, 4 MM). References: [7] [8] [32] [144] [182] [194] [234j [259][261] [299] [405j

DFM (Datarol): (CR - NTT) Applications: List & Symbolic Processing. Control: Dataflow. Number of PEs: 1000s (in clusters of 8). Type of PE: Custom. Interconnection Network: 3 Nets: Inter-PE (based on n network), result & instruction (buses in 8-PE prototype). Memory Organization: Global: Structure Memory (32Kx71-bit) + Local: instruction (8Kx53-bit) + operand (32Kx43-bit). Performance: 1.8MOPS/PE. Host: VAX-ll/750 (in prototype). Period: 1986 - 8-PE + 8-SM prototype; 1024-PE system built by 1992. References: [8] [9j [174] [313] [476]

DIPOD: (NL + C - RSRE + Logica Space and Defence Systems (UK)) Applications: Image Processing. Control: Dataflow. Number of PEs: ::;14. Type of PE: MC68000 Controller + AMD29000-based 16-bit bit-slice processing engine. Interconnection Network: Bus, 20MB/s: lOOns cycle, 16-bit data, 16 control lines. Memory Organization: Local, three sections: Common (to controller and processing engine), Bank Switch (two copies of the data), and Tagged Program. Performance: 5-8MIPS (total). Host: Plexus 35 Development and Control Node. Period: 1986 - 6-PE prototype. References: [208] [401j [402] Appendix. Information About the Systems 175

DIRECT: (A - U. of Wisconsin) Applications: Relational Database. Control: MIMD. Number of PEs: 9. Type of PE: 1xPDP-ll/40 + 8xPDP-ll/03. Interconnection Network: Crosspoint (to memory). Memory Organization: Global: 32MMs, 16KB /MM & Local: 28KW/PE. Performance: (Poor). Host: PDP-ll/45. Period: 1980 - operational (abandoned in 1982). References: [103]

DIRMU-25 (Kit): (A - U. of Erlangen-Nuremberg (Germany» Applications: General. Control: MIMD. N umber of PEs: 25. Type of PE: i8086/i8087. Interconnection Network: Reconfigurable, message passing or shared memory. Memory Organization: Distributed, 64KB/PE, 1.2JLs cycle, 1.66MB/s BW + Local, 320KB/PE RAM (500ns cycle) and 16KB ROM. Period: 1984 - 8-PE prototype; 1985 - 25 PEs. References: [265]

DOOM: (CR - Philips) Applications: General. Control: Object Oriented. Number of PEs: ~100. Type of PE: MC68020 in prototype; custom eventually. Interconnection Network: Generalized Chordal Ring Cube, packet switching (256-bit packets) through local Communication Processors (CPs), at 50K packets/s each. Memory Organization: Local, 4MB /PE. Host: Yes. Period: 1989 - 12-PE prototype. References: [324] [443] 176 Appendix. Information About the Systems

DPS-l: (C - InterSystems) Applications: Graphics (+ General). Control: MIMD. Number of PEs: ~16. Type of PE: Zilog Z-80, lOOns CPo Interconnection Network: Bus, message passing. Memory Organization: Distributed, 64KB/PE (DPR), ~70ns cycle. Period: 1978/79 - available commercially. References: [111}

Dragon: (CR - Xerox PARC) Applications: General. Control: MIMD. Number of PEs: 1-10 general-purpose processors + display processor, map processor. Type of PE: Custom, RISC, lOOns CP: instruction fetch unit + execution unit + FP unit + cache. Interconnection Network: Bus ("M-Bus") + connection to VMEbus for lOPs, commercial boards; local ("P-Bus") to cache. Memory Organization: Global,4-16MB. Performance: (~5MIPS + 1.5MFLOPS)/PE. Period: 1985 - prototype. References: [27} [294}

DSP-3: (CR - AT&T Bell Labs.) Applications: Signal Processing, Image Processing and Pattern Recognition. Control: MIMD. Number of PEs: ~128. Type of PE: AT&T DSP32C, 20ns CPo Interconnection Network: Reconfigurable (tree, linear, crossbar), based on 4-nearest-neighbor mesh, in 2 cycles, by 1 routing chip/PE. Memory Organization: Local, 1MB /PE. Performance: Peak: 25MFLOPS/PE (32-bit). Host: MC68030-based "real-time host" for IN configuration. Period: 1989 - prototype. References: [393} Appendix. Information About tbe Systems 177

DTM-l: (CR - MITRE Corp.) Applications: Image processing, graphics, pattern matching, signal processing, cryptography. Control: MSIMD - Heterogeneous, reconfigurable, systolic cells. Number of PEs: 16 custom VLSI devices, each 4096 cells. Type of PE: Custom Boolean processing elements. Interconnection Network: 2-D packed exponential cycles network, augmenting a nearest-neighbor mesh. Memory Organization: Distributed and reconfigurable. Performance: 64x64 8-bit Abingdon Cross in 5.12 J.Lsec. Host: VME network access via an MC68020-based computer. Period: 1991 - prototype. References: [450}

DTN: (C - Dataflow Technology Nederland) Applications: Graphics. Control: Dataflow. Number of PEs: 2 subsystems: Graphics subsystem - 4 systolic arrays; Dataflow engine - 32 (8 boards x 8 PEs). Type of PE: In Dataflow engine: NEC ImPP (J.LPD7281), lOOns CP, 4/board + 1 MAGIC (NEC J.LPD9305) interface chip/board. Interconnection Network: VMEbus + high-speed time-sliced bus among subsystems, host; token-ring inside board, 5M tokens (32-bit) per link/so Memory Organization: Distributed, 1MW (32-bit)/board. Performance: 2-10 MIPS/PE. Host: Sun-3/UNIX, via VMEbus. Period: Built by 1990. References: [450}

EDDY: (CR - NTT) Applications: General. Control: Dataflow. Number of PEs: 16 (4x4). Type of PE: 2xZ8001, 16-bit circular pipeline: one for execution, one for packet routing. Interconnection Network: 8-nearest-neighbor Mesh + 2 broadcast control units. Token passing. Memory Organization: Local. Period: 1983 - Prototype. References: [428} [429J [476J 178 Appendix. Information About the Systems

EGPA: (A - U. of Erlangen-Nuremberg) Applications: General. Control: MIMD. Number of PEs: 5. Type of PE: AEG 60-80, 32-bit. Interconnection Network: Quadtree Pyramid. Memory Organization: Distributed. Period: 1979 - prototype; 1985 - 21-PE prototype realized using the DIRMU-25 kit (see separately). References: [60} [61} [137}

El'brus family (-1, -2, -3 & Micro-El'brus): (NL + C - Institute of Precision Mechanics and Computer Technology + Zagorsk Electro-Mechanical Factory) Applications: General + Real-Time Applications. Control: MIMD. Number of PEs: 1-10 (I, 2 & Micro); 1-16 (3). Type of PE: Custom, 64-bit (1, 2), 64-bit VLIW (3); 32-bit (Micro). Interconnection Network: Cross bar. Memory Organization: Shared, ::;2GB on El'brus-3 (+ local, 16MB/PE). Performance: 1.25MIPS/PE (I); 12.5MIPS/PE, 9.4MFLOPS/PE (2); 8MIPS/PE (Micro); 560MFLOPS/PE (3). Period: 1977 - El'brus-l prototype, 1980 - state testing; 1985 - EI'brus-2 prototype, 1986 - production; 1990 - Micro-El'brus prototype; 1992 - El'brus-3 nearing prototype stage. References: {470} {471}

ELI (-512): (A - Yale U.) Applications: General. Control: Multiple VLIW. Number of PEs: 16 clusters. Type of PE: Clusters of two types: 8 "M" - 32-bit fixed point with memory; 8 "F" - floating point. Interconnection Network: Chordal Ring: 1- & 6-away connections. Memory Organization: Local. Host: Yes. Period: 1984 - prototype; later - a commercial uniprocessor spin-off ("Trace"). References: [7] [126} Appendix. Information About the Systems 179

Elxsi 6400 Series: (C - Elxsi) Applications: General. Control: MIMD. Number of PEs: 1-12. Type of PE: Custom, ECL, 50ns CP + cache: (16KB on 6410, 64KB on 6420, 1MB on 6460) + Vector Processor, 25ns CP on 6460. Interconnection Network: Bus ("Gigabus"), 110-bit wide (64- data, 35 - control, 11 - parity), message passing, 320MB/s (25ns cycle). Memory Organization: Global, ::;8 x 256MB, 400ns cycle. Performance: Peak: 40MFLOPS on 6410, 120MFLOPS on 6420; measured (L100): 0.6MFLOPS/PE on 6410, 1.5MFLOPS/PE on 6420, 10MFLOPS/PE on 6460. Period: 1984 - 6410; 1986 - 6420; 1988 - 6460; 9/1989 - ceased manufacturing. References: [32] [145] [182] [183] [194] [215] [259] [299] [352} [443]

EM-3: (NL - Electrotechnical Laboratory (Japan)) Applications: LISP and Knowledge Bases. Control: Dataflow. Number of PEs: 16. Type of PE: Custom, with an execution unit based on MC68000. Interconnection Network: Router network, with a packet (96-bit) transfer time of 150ns; message passing. Memory Organization: Local, 128KB/PE. Performance: Measured: 10MIPS. Host: PDP-11/44 Control Processor. Period: 1984 - 8-PE prototype; 1985 - 16 PEs. References: [313] {438} [475)

EM-4: (NL - Electrotechnical Laboratory (Japan)) Applications: General, Symbolic Computation. Control: Dataflow. Number of PEs: 1024 (5PEs/Group Board). Type of PE: EMC-R pipelined RISC processor, 80ns CP (switch unit, memory control unit, input buffer, fetch & match unit, execution unit). Interconnection Network: Ring between clusters; bus in the cluster (2MB/s), message passing. Memory Organization: Local, 2KB /PE. Performance: 1.4MIPS/PE. Host: PDP-l1. Period: Available commercially since 1975. References: [348} [349} [350}[476] 180 Appendix. Information About the Systems

EMMA: (C - ELSAG (Italy)) Applications: Pattern Recognition. Control: MIMD. Number of PEs: Several clusters, :::;128 PEs each. Type of PE: Custom, 16-bit microprocessor: Arithmetic Unit + Memory + I/O port. Interconnection Network: Point-to-point connections among regions, buses among families and inside families. Memory Organization: Hierarchically distributed: 7MB/region + 8MB/family + Local. Performance: IMIPS/PE. Period: 1985 - prototype. References: [423]

EMMA-2: (C - ELSAG (Italy)) Applications: Signal Processing. Control: MIMD. Number of PEs: :::;1000 (several regions, made up of families of PEs). Type of PE: i80286 + semicustom coprocessor. Interconnection Network: 17x17 Intercom network (with memory) + broadcast option + bus. Memory Organization: Distributed, 17x 12KW for Job Controller, 17x256W for PEs; aPE (k) writes on MMs on its row (MMkj), reads from MMs on its column (MMik)' Host: PDP-ll + Job Control Unit (for configuration). Period: 1981 - fully operational. References: [18]

Empress: (A - Swiss Federal Institute of Technology (ETH)) Applications: General. Control: MIMD. Number of PEs: 16. Type of PE: DEC LSI-l1. Interconnection Network: Circular n Network, packet (31-bit a ddress, 39-bit data, 7-bit control) switching; 80ns cycle; BW on 80-PE system: 1 4.6GB/s. Memory Organization: Local, :::;5.25MB/PE. Performance: 12.5MIPSjPE. Host: Sun-3/260 via VMEbus. Period: 1989 - 5-PE prototype; 4/1990 - 80-PE prototype. References: [69] Appendix. Information About the Systems 181

Encore Multimax Series: (C - Encore) Applications: General. Control: MIMD. Number of PEs: 2-20. Type of PE: NS32032 on 120, NS32332 on 320, NS32532 on 500; + NS FPA (e.g., NS32081) or Weitek FP chips + 64KB cache. Interconnection Network: Bus ("Nanobus"): split transaction: 64-bit data bus + 32-bit address bus + 14-bit interrupt bus; 80ns cycle, 100MB/so Memory Organization: Global, several shared MMs - a card with 16MB, 100MB/s access rate. Performance: 1.5MIPS/2 PE (120); 4MIPS/2 PE (320); 170MIPS total (500); 1MFLOPS (64-bit) for the Weitek chip set. Host: System Control Board - 2-CPU board, NS32016 with 64KB ROM, 128KB RAM, running at 17MIPS. Period: 1985 - available commercially. References: [145J [194J [208J [215J [252} [259J [443J

ES-l: (C - Evans & Sutherland Computer) Applications: General. Control: MIMD. Number of PEs: 2-8 "processors," each 16 Computational Units (2 CUs/board). Type of PE: "CU": Custom, Scalar, 32-bit, 40ns CP, multiple-pipeline (3FP + 2 Integer) + 16K entries (128KB) I-Cache, 8K entries (64KB) D-Cache. Interconnection Network: "Nonblocking" crossbar to memory (32-bit data paths) within "processor," crossbar among "processors." Memory Organization: Distributed, 256MB/"processor," lOOns cycle, available at 800MB/s within the processor. Performance: Peak: 12.5MIPS and 12.5MFLOPS (64-bit)/CU; tested: 15-23MFLOPS/16 CUs on Livermore Loops, 115-165MFLOPS/16 CUs on LlOOO. Period: 7/1989 - announced; 10/1989 - Beta sites; 11/1989 - closed. References: [3ll) [366J [367J 182 Appendix. Information About the Systems

ES-2701: (NL + C - Institute of Cybernetics of the Ukrainian Academy of Science, Kiev) Applications: Numeric. Control: Object-Oriented. Number of PEs: 16-192 Arithmetic processors + 32 Control + 32 Switching processors. Type of PE: Custom. Interconnection Network: Reconfigurable, based on a multistaged network, BW: 1MB/s per link. Memory Organization: Local. Performance: 2.8MIPS/PE. Period: 1985 - prototype. References: {471}

ES-2703: (NL + CR - Taganrog Radio Engineering Institute (former USSR)) Applications: Numeric. Control: MIMD. Number of PEs: 1-256. Type of PE: Custom - 4 16-bit ALUs + 4 multipliers in each PE. Interconnection Network: Crossbar within PE, reconfigurable, based on a crossbar among PEs. Memory Organization: Shared + Local. Performance: Peak: 1GIPS. Period: 1990 - prototype with 16 PEs. References: {471}

ES-2704 + ES-2727: (NL + C - Leningrad Institute of Informatics and Information + NITsEVT (Scientific-Research Centre for Electronic Computer Technology - former USSR)) Applications: Functional Languages + General. Control: Dataflow. Number of PEs: 2704: 1-24; 2727: 1-72: 6 clusters x 12 PEs + 6 switching modules. Type of PE: Custom, 32-bit; 50ns CP on 2727. Interconnection Network: Reconfigurable, based on a switching network, using 12 specialized PEs. Memory Organization: Local, 256KB. Performance: 2704: 100MIPS; 2727: 750-1000MIPS. Period: 1985 - 2704; 1990 - 2727 under design. References: !471} Appendix. Information About the Systems 183

ESL Systolic Array: (CR - ESL, Advanced Processor Technology Laboratory) Applications: Signal and Image Processing. Control: SIMD (Systolic). Number of PEs: 1 - few tens. Type of PE: TRW multiplier-accumulator chip; 200ns CP, 16-bit fixed point. Interconnection Network: Linear. Memory Organization: Local. Performance: Peak: 10MOPS/PE; sustained: 130MOPS/20PEs. Host: VAX-ll/780. Period: 1981 - 20-PE array. References: [242}

ETA Series: (C - Eta Systems) Applications: General. Control: Multiple Vector Processors. Number of PEs: :52 on lOP, Q; :58 on lOE, F, G . Type of PE: Custom, including 2 FP pipelines; 24ns CP on lOP, 19ns CP on 10Q, 1O.5ns CP on 10E, 8Ans CP on lOF, 7ns CP on lOG. Interconnection Network: Via shared memory. Memory Organization: Global, :5256MW (64-bit) available at 100Gbit/s + Local, 4MW/CPU. Performance: (model, Number of PEs: MFLOPS peak, MFLOPS/PE on LlOO, MFLOPS/PE on L300): lOP (2): 750, 25, 80; lOQ (2): 947, 31, 101; lOE (8): 6857, 56, 182; lOG8 (8): ~10286, ~84, 273; GF-30 (8): 30GFLOPS. Host: Yes. Period: available commercially from 1988 to 1989. References: [32] [182] [183] [194]

FAIM-l: (CR - Schlumberger Palo Alto Research) Applications: Artificial Intelligence. Control: Object Oriented. N umber of PEs: 1 or more E-n surfaces: (3n(n-1)+1 PEs each, n on each hexagon Edge). Type of PE: Custom, including: Evaluation Processor (RISC, 20-bit ALU); Switching Processor (for context switching); Post Office; instruction memory and pattern-addressable memory. Interconnection Network: Hexagonal mesh (wrapped around to form a "torus"), 20-bit-wide data paths (+ 2 control wires); message passing. Memory Organization: Local, 1MW (28-bit: 20 - data, 8 - tag). Period: 1987 - 19-PE E-3 surface prototype. References: [12} [48] [196] [444] 184 Appendix. Information About the Systems

FASP: (C - Tridex) Applications: Signal Processing (Satellite-Based). Control: MIMD. Number of PEs: 9 (3x3). Type of PE: PE's data processing section: i80186 (16-bit) + i80130 (OS chip); signal processing section: TI TMS320; the sections communicate via a DPR with semaphores. Interconnection Network: 6 buses: 3 X-buses and 3 Y-buses in the array, message passing. Memory Organization: Local. Period: 1984 - 3-PE prototype; 1985 - 9-PE prototype. References: [21 J

FDS-RII: (A - U. of Tokyo Institute of Industrial Science) Applications: Database. Control: MIMD. Number of PEs: 4 (1 Master + 3 Slaves). Type of PE: MC68020, 62.5ns CPo Interconnection Network: VersaBus + SO bus (results only). Memory Organization: Global, 6MB Staging Buffer + local, IMB/PE. Host: Yes. Period: Built by 1990. References: [227J [228}

FEM: (NL - NASA Langley Research Center) Applications: Finite Element Calculations. Control: MIMD. Number of PEs: 36 (6x6). Type of PE: TI9900 + AM9512 FP coprocessor. Interconnection Network: 8-nearest-neighbor Mesh, I-bit data paths + 16-bit global bus. Memory Organization: Local, 32KB RAM + 16KB ROM. Host: TI990 controller. Period: 1983 - 4x2 prototype; 1984 - 4x4 prototype. References: [182] [183] Appendix. Information About the Systems 185

FERMATOR: (A - U. of Toronto) Applications: General. Control: M1MD. Number of PEs: Several "Stations," each containing several PEs. Type of PE: 1 mini (PDP-ll/34 in the prototype) and the rest micros (MC6809 in the prototype). Interconnection Network: Ring among stations, bus (P-Bus, 24-bit, 60Mbit/s) in station; message passing. Memory Organization: Local. Period: 1984 - 3 stations, 6-PE prototype. References: [339]

FFP (Cellular): (A - North Carolina U.) Applications: Symbolic (FFP) Reduction. Control: M1MD. Number of PEs: :SlM. Type of PE: Custom (very simple) - Leaves: CPU + memory (execution); internal nodes: also program loading and balancing. Interconnection Network: Binary tree with horizontal links at leaves; message passing. Memory Organization: Distributed. Period: 1985 - 15-PE prototype. References: [7] [8] [174] [445] [449]

Fifth Generation Computer: (C - Fifth Generation) Applications: Production System or Signal Processing (two versions). Control: Partitionable S1MD/M1MD. Number of PEs: 2P -1; P = 2 -13. Type of PE: MC68020. Interconnection Network: Binary Tree. Memory Organization: Local. Performance: 14G1PS. Period: 1988 - available commercially.

Firefly I,ll: (CR - DEC Research) Applications: General. Control: M1MD. Number of PEs: 5-9. Type of PE: p,VAX (I), CVAX (II). Interconnection Network: Two buses: M-bus (PEs, memory), lOOns cycle, lOMB/s (4-byte/400ns); Q-bus (devices), 3MB/s. Memory Organization: Global, 16MB (I), 128MB (II) available at 10MB/s. Period: 1985/86 - model I; 1987 - model II. References: [435] 186 Appendix. Information About the Systems

Flagship: (A + C - Manchester V., Imperial College + ICL) Applications: Declarative Languages. Control: Graph Reduction. Number of PEs: 15. Type of PE: MC68020. Interconnection Network: Delta Network, 5MB/s per port (bus in prototype), message passing. Memory Organization: Distributed, 16MB/PE. Host: Sun-3/50 + host interface node. Period: 1988 - prototype. References: [330} !461}

Flexible Flex/32: (C - Flexible) Applications: General. Control: MIMD. Number of PEs: ~2480 (20 per cabinet). Type of PE: NS32032 + NS32081 or MC68020 + MC68881 (heterogeneous) . Interconnection Network: Buses: 10 local asynchronous VME buses per cabinet; 2 common, synchronous VME buses. Memory Organization: Distributed, 1-4MB/PE. Performance: 3.5MIPS/PE, 4MFLOPS/PE peak, IMFLOPS/PE measured on L100. Period: 1984 - -available commercially. References: [194] [208] [215] [259] [279} [443]

FLIP: (NL - Research Institute for Information Processing and Pattern Recognition (Germany)) Applications: Image Processing. Control: Dataflow. Number of PEs: 16, in FIP (processing unit). Type of PE: Custom, 8-bit. Interconnection Network: 4 buses: 2 - data input; 1 - data output; 1 - Control and Instructions; 45MB/s on data buses; message passing. Memory Organization: Global, 768KB Image Memory + 8 MMsx3KW in PEP (communication unit). Performance: 64MIPS (in FIP). Host: PDP-ll/45. Period: 1980/81 - prototype. References: [147} [148} Appendix. Information About the Systems 187

Flosolver: (NL + C - National Aeronautical Laboratory + WIPRO (India» Applications: Computational Fluid Dynamics, Aeronautical computations. Control: MIMD. Number of PEs: ~64 (16 "Nodes" x 4 PEs); 4 PEs on Mk 1; 2 nodes x 4 PEs on Mk 2; 16 PEs planned for "Mk X." Type of PE: i80386 + i80387 on Mk 2. Interconnection Network: Bus - IEEE P769 on Mk 2. Memory Organization: Global, 4 MMs (shared memory communication) . Performance: 1.3MFLOPS on Mk 2 (5-8MFLOPS planned for Mk X). Host: One host per node. Period: 10/1987 - Mk 1; 7/1988 - Mk 2. References: [46]

FPS T-Series (2 generations): (C - FPS Computing) Applications: General. Control: Multiple Vector Processors. Number of PEs: 8-16,384 (in 8-node modules) + 1 System Unit per module. Type of PE: T800 Transputer + Vector Processor, 50ns CPo Interconnection Network: Hypercube: 1st generation - 125ns cycle, 0.5MB/s on internode link after a 5J.ts start-up, 2nd generation - 9.2MB/s; ring among system units. Memory Organization: Local, 2KB on-chip RAM (single cycle) + 1MB/PE (1st gen.) or 4MB (2nd gen.) off-chip multiported DRAM + 64KW x 32-bit instruction memory. Performance: 1.5MFLOPS + 7.5MIPS (1st gen.), lOMIPS (2nd gen.) per PE, 19MFLOPS/PE in vector mode. Host: DEC j.£VAX II. Period: 4/1986 - 1st generation; 1988 - 2nd generation; 1988 - production stopped; 10 installed, ~128 PEs. References: [8] [120} [135} [145] [160} [182] [194] [261J [288} [389]

FTMP: (NL + C - NASA Langley Research Center + Charles Stark Draper Research Laboratory) Applications: General (Airborne). Control: MIMD. N umber of PEs: 3 effective (each a triplet for reliability) - 10 "real." Type of PE: Custom. Interconnection Network: Multiple buses. Memory Organization: Global. Period: 1979 - prototype. References: [244} 188 Appendix. Information About the Systems

FTPP: (CR - Charles Stark Draper Research Laboratory) Applications: General. Control: MIMD. Number of PEs: ~192 in 4 clusters of 6 Network Elements (NEs) of 8PEs (forming 64 Fault Masking Groups, each of 3 PEs) + 1 JOE. Type of PE: MC68020, 80ns CP on C1 model; MC68030, 50ns CP on C2 model; MC68030 or i80960 or DSP32C planned for C3 model. Interconnection Network: VMEbus within NE, full interconnection among NEs in cluster + equivalent-position-IOE links to 2 nearest clusters. Memory Organization: Local. Performance: 0.6-1.8MIPS/PE on C1; 0.96-2.88MIPS/PE on C2. Period: 1989 - 16-PE C1; 1989 - 4-PE C2, using optic interconnect among NEs. References: [168J

Fujitsu Kabu-Wake: (CR - Fujitsu) Applications: Inference. Control: MIMD. Number of PEs: 16 (15 + 1 for I/O). Type of PE: MC6801O. Interconnection Network: Two Nets: Control (job requesting) - Ring; Data (job transferring) - multistaged switching network. Memory Organization: Local. Performance: 1KLIPS/PE. Period: 1986 - prototype for the PIM. References: [48] [443] [444]

GAM I, II: (A - George Mason U.) Applications: Image Processing. Control: SIMD. Number of PEs: GAM I: 341 (16x16+ 8x8 + 4x4 + 2x2 + 1x1) + 85 adders (augmenting pyramid on all levels but lowest); GAM II: 1365 (GAM I + 32x32 6th level). Type of PE: I-bit, same as the MPP's (see separately), 8 per chip. Interconnection Network: Quadtree Pyramid + 4-nearest-neighbor torus on lowest 3 levels. Memory Organization: Local, 8Kbit/PE. Host: AT&T 6300 PC. Period: 1986 - GAM I; GAM II built by 1989. References: [3J [358J Appendix. Information About the Systems 189

Gamma: (A - U. of Wisconsin) Applications: Database. Control: Dataflow. Number of PEs: 17 (1 host, 8 with disk for Selection/Update of relation, 8 without: 1 for load balancing and 7 for Join, Project, and Aggregate operations). Type of PE: VAX-11/750. Interconnection Network: Token Ring, 80MB/s, message passing. Memory Organization: Local, 2MB /PE (+ 333MB disk on 8 PEs). Host: VAX-11/750 + UNIX. Period: 1985 - prototype implemented on the "Crystal" testbed. References: (104] (105]

GAPP (Geometric Arithmetic Parallel Processor): (C - Martin Marietta/NCR) Applications: Image Processing. Control: SIMD. Number of PEs: :::;82,944 (72 PEs/chip, 8x 11 chips/board). Type of PE: Custom, I-bit full adder/subtract or + 2 communication registers, lOOns CPo Interconnection Network: 4-nearest-neighbor mesh, via communication registers. Memory Organization: Local, 128xl-bit/PE. Performance: 28.8 MIPS/72 PEs (8-bit +&-). Host: VAX, PC-AT, or Sun. Period: 1982 - breadboard 6x12 array; 1983 - 3x6 GAPP I; 1984 - 6x 12 GAPP II; 1985 - full system. References: (81] 190 Appendix. Information About the Systems

GF-ll: (CR - IBM T. J. Watson Research Center) Applications: Quantum Chromodynamics. Control: SIMD. Number of PEs: 576 (512 + 64 for backup) - 10 are disk units. Type of PE: 2x(Weitek WTL1032 FP-x + WTL1033 ALU) pipelined functional units + fixed-point ALU, 50ns CPo Interconnection Network: "Memphis Switch": Benes network, made up of 24x24 crossbar elements with 9-bit-byte data paths, controlled by an IBM 3084. BW is 20MB/s on each path, for a total of 11.5GB/s. Reconfiguration time is 200ns for each of 1024 preloaded configurations. Memory Organization: Local, 2MB /PE at 200ns access + 64KB SRAM at 50ns access. Performance: Peak: 11GFLOPS, sustained: lOGFLOPS i.e. 20MFLOPS/PE (in pipelined mode) + 20 MIPS in ALU. Host: IBM 3090 host + IBM PC/RT controller, which feeds a 512KW (256-bit) instruction store. Period: 1989 - operational. References: [7J [8J [4 I} [194J [261J

GRID: (C - General Electric (UK)) Applications: Image Processing. Control: SIMD. Number of PEs: 4096 (64x64) + 1 Scalar Processor. Type of PE: Custom, I-bit, (4x8/chip), lOOns CP; MC68000 Scalar Processor. Interconnection Network: Single-twist 4-nearest-neighbor torus + X & Y buses. Memory Organization: Local, 64-bit dual-ported RAM per PE on chip, 64Kbit off-chip RAM. Performance: 1.6GOPS (8-bit add); O.4GOPS (8-bit mUltiply); 45MFLOPS (32-bit mUltiply). Host: Plexus P60 + UNIX. Period: 1987 - chip version. References: [3IB} [33B} [345J Appendix. Information About the Systems 191

GRlP: (A - University College, London) Applications: General. Control: Dataflow. N umber of PEs: 84. Type of PE: MC68020 + FP coprocessor. Interconnection Network: Bus (IEEE Futurebus), packet switching. Memory Organization: Global, ::; 21 IMUs + Local, 0.5MB/PE. Host: Orion with UNIX, via 16-bit DMA channel + System Manager PE. Period: 1987 - prototype. References: [323} [443] [444]

H4400: (CR - Hughes Aircraft) Applications: General. Control: MIMD. Number of PEs: ::;8: 1-7 Arithmetic Control Processors (ACPs), 1-7 I/O Processors (lOPs), and 0-6 Special Purpose Processors (SPPs). Type of PE: Custom, 32-bit. Interconnection Network: Via multiported memory (through a multiported Memory/Processor Switch). Memory Organization: Global, ::;IMB (9-bit byte): 16MMs x 16KW x 36-bit, l.4jls cycle, 0.48jls access. Performance: 0.6MIPS/PE. Period: 1970 - 2-ACP + 2-IOP prototype. References: [112] [192}

HAP Series: (CR - NTT Electrical Communications Labs) Applications: General. Control: Partitionable SIMD/ MIMD. N umber of PEs: ::;4096 in the execution array layer; ::;64 in the control array layer, 1 System Management Processor. Type of PE: i8086/8087 in model I, i80286/80287 in model II, i80386/80387 in model III. Interconnection Network: Torus for inner- layer connection, bus for interlayer connections, packet switching on all levels (4-bitx4 for inter-PE, 4-bitx2 for interlayer, 16-bitxl for I/O - except III, with 32-bitxl). Memory Organization: Local, 256KB/PE (I), 0.5-2MB/PE (II), I-4MB (III). Performance: Per PE: 0.8MIPS + 0.04MFLOPS (I), 1.6MIPS + 0.08MFLOPS (II), 4MIPS + 0.45MFLOPS (III). Host: Yes. Period: 1986 - 256-PE prototype. References: [293} 192 Appendix. Information About the Systems

HAPPE: (C - Honeywell) Applications: Signal Processing. Control: SIMD. Number of PEs: 3. Type of PE: Custom, MSL Interconnection Network: 2 buses - to correlation control and to arithmetic control. Memory Organization: Local. Period: 1973 - demonstrator. References: [277J

Harpy: (A - CMU) Applications: Database. Control: MIMD. Number of PEs: 5 + Memory Computer. Type of PE: DEC LSI-11, 16-bit, 150-200ns CPo Interconnection Network: Bus + Memory Computer. Memory Organization: Global (4Mbit) + Local (8Kx16-bit). Period: 1980 - prototype. References: [53J

HBA-I, II: (C - Hughes Aircraft) Applications: Image Processing. Control: MIMD (SPMD). N umber of PEs: 24. Type of PE: MC68000 on HBA-I, MC68020/68881, 83.3ns CP on HBA-II. Interconnection Network: 3 Buses: 2 - Video, 1 - Communication, 16-bit wide, message passing. Memory Organization: Local, 400KB/PE on HBA-I, 912KB/PE on HBA-II. Performance: 24MIPS (HBA-II). Host: Sun 3/170 (HBA-II). Period: 1987 - available commercially. References: {458J Appendix. Information About the Systems 193

HC16-186: (A - Norwegian Institute of Technology) Applications: Database. Control: MIMD. Number of PEs: 16. Type of PE: i80186, lOOns CPo Interconnection Network: 5-D Hypercube (1 connection to host), 16-bit data paths, connections via 2KB Dual Ported RAM, 2.5MB/s on each link (20MB/s total BW). Memory Organization: Local, 1MB /PE + 65MB disk. Host: Yes. Period: 1989 - first run. References: [67}

HCRC: (A - Hybrid Computer Research Centre & The Institute for Research in Cosmic Physics and Relative Technology) Applications: General. Control: MIMD. Number of PEs: 4L . Type of PE: 3 parts: Processing Subsystem (PCS) - Scalar Processor, T800-20; Vector Processor including add/logic pipeline, multiply pipeline, divide pipeline vector register bank & multiported RAM; Kernel Subsystem (KNS) - T414; Communication Subsystem (CMS) - 2xT414. Interconnection Network: WK-Recursive network, where N = KL (N = No. of PEs; K = node degree; L = expansion level); K virtual nodes, each connected by K - 1 links to the other nodes, with one extra link to the environment. Memory Organization: Local: 2MB DRAM + 128KB SRAM on Scalar Processor; 1MB DRAM on KNS; 32KB SRAM on CMS. Period: 1989 - prototype. References: [52}

HDM: (CR - Mitsubishi) Applications: Database. Control: MIMD. Number of PEs: ~17: 1- Master, the rest - slaves. Type of PE: MC68020. Interconnection Network: Bus + direct interrupt lines. Memory Organization: Global, 4MB, at the master, accessible at lOMB/s +Local, 4MB/slave PE (+ 170MB local disk). Host: Melcom 70 MX/3000, connected via a Melnet R32 Ring Bus. Period: 1989 - 5-PE prototype. References: [302} 194 Appendix. Information About the Systems

HDVSP: (CR - NEC C&C Systems Research Laboratories) Applications: Signal Processing /Image Processing. Control: MIMD. Number of PEs: 128 (8 clusters x 16 PEs). Type of PE: Custom. Interconnection Network: 3 buses within cluster (input, output, feedback), 2 among clusters (I/O, feedback). Memory Organization: Shared + Local. Period: 1990 - prototype. References: [308]

Hector: (A - U. of Toronto) Applications: General. Control: MIMD. N umber of PEs: Several stations, each of several PEs. Type of PE: MC88100, 50ns CP + ::;4 MC88200, 16KB cache. Interconnection Network: 2-level hierarchy of rings, connecting buses within stations, passing synchronized packets. Memory Organization: Distributed, 16MB/PE. Period: 1991 - prototype (several stations). References: [456]

Honeywell 6000 Series; 6180 Multics: (C - Honeywell Information Systems) Applications: General. Control: MIMD. Number of PEs: 4 (6000 Series); 6 (6180). Type of PE: Custom, 36-bit. Interconnection Network: Via multiported memory (through a system controller). Memory Organization: Global, shared variables, ::;2MB, 1.2JLs (6050/6060) ::;4MB, 0.5JLs cycle (6070/6080) ::;8MB 0.5JLs (6180). Performance: 550KIPS/PE (6050/6060), 1.4MIPS/PE (6070/6080, 6180). Period: 1971 - available commercially (6050/6060/6080). References: [185] Appendix. Information About the Systems 195

Hughes Systolic/Cellular Array Processor: (CR - Hughes Research Labs.) Applications: General, Neural Networks. Control: SIMD. Number of PEs: 256 (16xI6). Type of PE: Custom, fixed-point, 32-bit, with 2 adders, 2 multipliers, divider. Interconnection Network: 4-nearest-neighbor torus. Memory Organization: Local, 24W /PE + 2KW (256-bit) DPR available to top & bottom rows. Host: Yes. Period: 1989 - prototype built. References: [207J [386J

HyperFlo: (C - PC/M) Applications: General. Control: Dataflow. Number of PEs: Several MPUsj MPU-l: 4 PEs (2 with FP coprocessors)j MPU-2: 5 PEs (all with FP coprocessors). Type of PE: MC68020 on MPU-lj MC68030 on MPU-2. Interconnection Network: Bus: VMEbus among MPUs, 80MB/sj local bus in MPU. Memory Organization: Global "Data Channels" & Local (message passing). Period: Introduced - late 1980s. References: [127J iAPX-432: (C - Intel) Applications: General. Control: MIMD. Number of PEs: 1-4 General Data Processors (GDPs) + 1-3 Interface Processors (IPs) (for device handling). Type of PE: GDP: 43201 (Instruction fetch & decode) + 43202 (ALU + addressing)j IP: 43203 + I/O channel. Interconnection Network: Bus, message passing (packets of 1-6 bytes) + broadcasting. Memory Organization: Global. Period: 1984 - available commercially. References: [87J 196 Appendix. Information About the Systems

IBM 3081KX, 3084QX: (C - IBM) Applications: General. Control: MIMD. Number of PEs: 2 (3081), 4 (3084). Type of PE: Custom, 24ns CPo Interconnection Network: Via multiported memory. Memory Organization: Global, :::;64MB (3081), :::;128MB (3084), 312ns access time. Performance: 19MIPS (3081), 29MIPS (3084). Period: Early 1980s. References: [32]

IBM 3090 Series: (C - IBM) Applications: General. Control: Multiple Vector Processors. Number of PEs: 2,4, 6 (-/200, -/400, -/600 models). Type of PE: Custom, 18.5ns CP + 2-cycle 64KB cache + Vector Facility (VF models). Interconnection Network: Via multiported memory. Memory Organization: Global, 64+128MB (j200) , 128+256MB (j400), 192+384MB (j600) , 11 cycle access. Performance: Peak: 45MIPS (j200), 80MIPS (j400)j measured: 12MFLOPS/PE on LIOO, 33MFLOPS/PE on L300. Period: 1985 - available commercially. References: [8] [40J [194]

IBM PVS (Power Visualization System): (C - IBM) Applications: Graphics, General. Control: MIMD. Number of PEs: 8-32. Type of PE: , 25ns CP, 4KB I-cache, 8KB D-cache. Interconnection Network: Bus, 256-bit, BW: 1.28GB/s, sustained. Memory Organization: Shared, 128-I024MB, available at 1.28GB/s + Local, 16MB/PE. Performance: 1.28GFLOPS peakj tested: 3IOMFLOPS on LIOOO, 925MFLOPS on L6000 (64-bit). Host: IBM RS/6000 Gateway via HiPPi Channel. Period: 11/1991 - available commercially. References: [304J Appendix. Information About the Systems 197

IBM SP! (Scalable POWERparallel System): (C - IBM) Applications: General. Control: MIMD. N umber of PEs: 8-64. Type of PE: IBM RISC System/6000 62.5 MHz. Interconnection Network: Multistaged,4 x 4 bidirectional switch. Memory Organization: Distributed, 64-256 MB/PE. Performance: 1-8 GFLOPS peak. Network switch: 40 MB/S peak, 500 ns switch latency. Host: IBM RS/6000 Gateway. Period: 1993 - available commercially. References: [198}

IDATEN: (CR - Fujitsu Labs.) Applications: Image Processing. Control: MIMD. Number of PEs: :::;14. Type of PE: Custom, 8-bit, lOOns/pixel. Interconnection Network: 16x 16 Benes Network, circuit switched (configurable by the controller), 12-bit wide. Memory Organization: Global, 512x512x8-bit. Host: MC68000 based + 1.5MB RAM, connected to the IP section via an IEEE-796 bus. Period: 1984 - prototype. References: [155} [353}

ILLIAC IV: (A + C - U. of Illinois, Urbana + Burroughs) Applications: General. Control: SIMD. Number of PEs: 64 (planned for 256). Type of PE: Custom, 64-bit, ~80ns CPo Interconnection Network: Single twist torus + 3 buses to the Control Unit - CU bus to PEM, Common Data Bus (CDB), and Mode-bit line. Memory Organization: Local, 2KW (64-bit), 240ns cycle, 350ns/W access time. Performance: 4.5MIPS/PE, 7.8MFLOPS/PE peak (500MFLOPS total), 40MFLOPS, 200MIPS sustained. Host: Burroughs B6700. Period: Operational in 1972. References: [8] [43, pp/320-333} [65} [112] [123] [195] 1403} [418} 198 Appendix. Information About the Systems

IP: (CR - Hitachi) Applications: Image Processing. Control: MIMD. Number of PEs: 16. Type of PE: Custom, 12-bit (8-bit - data, 4-bit - spread msb), 350ns / instruction. Interconnection Network: Linear, via a local register array. Memory Organization: Local, 256xI6-bit/PE + Picture Memory buffer. Host: A minicomputer. Period: 1977 - prototype. References: [281 j

IP-I: (C - International Parallel Machines) Applications: General. Control: MIMD. Number of PEs: 9 - 1 master + 8 PEs; later: ~33. Type of PE: Custom, 32-bit. Interconnection Network: Crossbar. Memory Organization: Global, ~40MB. Period: 1985 - available commercially; 1987 - 16-PE model. References: [259] Appendix. Information About the Systems 199

Intel iPSC, iPSC/2 & iPSC/860 : (C - Intel Supercomputer Systems (Intel Scientific Computers)) Applications: General. Control: MIMD or Multiple Vector Processors ("VX" version). Number of PEs: 16-128 (:$64 on MX, VX versions). Type of PE: i80286/287, 125ns CP on iPSC; i80386j387 on iPSCj2; either can have Weitek WTL1167 chips ("SX" version) or vector board ("VX" version); + 82586 LAN coprocessor; i860 on iPSCj860, 25ns CPo Interconnection Network: Hypercube. iPSC: peak BW (per channel): lOMbitjs, measured latency: 1.7ms, transmission time: 2.83JLs/byte, BW: 2.8Mbit/s; on iPSC/2: measured latency: 0.58ms; BW (per channel): 2.8MB/s. Vector board (on adjacent slot to CPU) is connected via a local bus (iLBX); Special routing chips on iPSC/2 & iPSC/860. Memory Organization: Local, DPR, 512KB/PE (iPSC) + 4MB/PE (MX version) or 1MB/PE (VX version), 16MB/PE (iPSC/2). Performance: iPSC: 100MIPS total, 60KFLOPS/PE (with i80287); iPSC/2: 300KFLOPS/PE (with i80387), 900KFLOPS/PE (on SX version); 6.67MFLOPS/PE (double precision), 20MFLOPS/PE (single precision) on VX version; iPSC/860: 60MFLOPS/PE (double precision), 80MFLOPS/PE (single precision). Host: i286/310 + XENIX (iPSC). Period: 1985 - iPSC; 1988 - iPSC/2; 1990 - iPSC/860. References: [8] [20} [145] [194] [202} [208} [215] [261] [292] [389] [317} [443] 200 Appendix. Information About the Systems

ITT CAP I, II: (CR - ITT Advanced Technology Center) Applications: General. Control: SIMD. Number of PEs: 16 (+ 4 PEs for fault tolerance), on one chip in CAP I; 16 chips (4 x 4 array) each with 16 PEs (+ 4 for fault tolerance) in CAP II - each chip is configurable as 4 x 64-bit PEs or 8 x 32-bit PEs or 16 x 16-bit PEs. Type of PE: Custom, I-bit, 667ns CP (CAP I); custom, 16-bit RISC bit-slice, 80ns CP (CAP II). Interconnection Network: X and Y buses along chip-array rows/columns. Linear within chip. Memory Organization: Global, partially shared (1 module for each row and each column) + Local, lKW/PE on-chip RAM (CAP II), 512KB/16 PE array off-chip RAM. Performance: For CAP II: 2:1000MIPS (32-bit fixed point) 2:2000MIPS (16-bit fixed point); 1MFLOPS/PE (32-bit, '* 128 PEs), 0.375MFLOPS/PE (64-bit '* 64 PEs). Host: Yes + Controller & scalar processor. Period: 1985 - prototype of CAP I. References: (297) (298)

IVA: (A + CR - V. of Massachussettes + Hughes Research Lab.) Applications: Knowledge-based Computer Vision & Processing. Control: SIMD at bottom level; MIMD /SPMD at intermediate level; MIMD at top level. Number of PEs: :S256K (512 x 512) PEs in bottom level (Content Addressable Array Parallel Processor - CAAPP); ~lK PEs (32 x 32) in intermediate level (Intermediate and Communication Associative Processor - ICAP); :S64 in top level (Symbolic Processing Array - SPA). Type of PE: I-bit in CAAPP; TI's TMS320C40 32-bit DSP chips in ICAP; 4-Processor SGI multiprocessor, MIPS-based, 32-bit in SPA. Interconnection Network: 2-D mesh + reconfigurable mesh in CAAPP; fully connected Quadnodes linked by ring + bus for shared memory in ICAP; bus in SPA. Memory Organization: DPR memory shared between levels + local in CAAPP, local (program) & distributed shared in Quadnodes in ICAP. Performance: DARPA IV Benchmark: 0.83 (planned for full configuration); peak: 16G pixel ops/s on 16K PE CAAPP; 32GFLOPS for 64-PE ICAP. Host: VMEbus-based machine. Period: 8/1991 - 1/64 model (4K CAAPP PEs + 64 ICAP PEs + 1 SPA PE); larger model (16K CAAPP PEs, 64 ICAP PEs, 4 SPA PEs) under construction in 9/1992. References: (462) (463) Appendix. Information About the Systems 201

IXM: (NL - Electrotechnical Laboratory (Japan)) Applications: Semantic Network Processing. Control: MIMD. Number of PEs: 32. Type of PE: T-800 Transputer. Interconnection Network: Reconfigurable: tree (3/4-ary) or hypercube. Memory Organization: Shared Associative Memory, 128KWx40-bit + Local: 4KB on-chip (57ns cycle), 32KWx32-bit (230ns cycle). Performance: 1880 MOPS; 8.75 MIPS/PE. Host: Sun 3/260. Period: 1989 - prototype built. References: [176}

IXM2: (NL + A - Electrotechnical Laboratory (Japan) + U. of Tsukuba) Applications: Semantic Network Processing. Control: SIMD. Number of PEs: 64 Associative Processors (APs) + 9 Network Processors (NPs) in 8 Processing Modules (PMs), each 8 APs + 1 NP, + 1 NP for host connection. Type of PE: AP - T800 transputer; NP - T800 transputer + 16 links, 57ns CPo Interconnection Network: Complete interconnection between APs within PM (4 via transputer links and 4 via bus), complete interconnection among PMs. Memory Organization: Local, 4KB on-chip RAM and 32Kx32-bit off-chip RAM per PE, 57ns cycle. Host: Sun 3. Period: 1990 - prototype installed at CMU. References: [17'l} 202 Appendix. Information About the Systems

K2: (A - Swiss Federal Institute of Technology (ETH)) Applications: General. Control: MIMD. Number of PEs: :::;24 - 16 Computation Nodes (CNs) + 8 I/O nodes (IONs). Type of PE: CN: AMD29000 processor + AMD29027 FP coprocessor + MC69030 Serial Interface Network Controller (SNIK) with 4 out & 4 in~ channels. Interconnection Network: 2 4-nearest-neighbor meshes: user, system; each 32-bit wide. User channel for application, raw data, routed manually by CPU, blocking or nonblocking, 400MB/s BW, 200ns latency, message passing via queues. System channel for system data, routed automatically by SNIK, nonblocking, IOOMB/s BW, 1.5J-Ls latency. Memory Organization: Local, CN: 2MB/PE Instruction memory + 8MB/PE data memory, ION: 2MB instruction memory + 8MB data memory, 40ns cycle. Host: Sun server. Period: 1991 - prototype. References: [16J [17J

KSRl: (C - Kendall Square Research) Applications: General. Control: MIMD. Number of PEs: 8-1088 (34 rings of 32 PEs). Type of PE: Custom, superscalar (2 instructions/cycle), 64-bit RISC, 50ns CP: Cell Execution Unit (CEU) + FPU + Integer and logical operations unit (IPU) + I/O channel (XIU)+ 256KB I-cache + 256KB D-Cache. Interconnection Network: Hierarchical rings, passing 128B memory subpage packets: bottom level ("Search-Engine:O" or SE:O) one-way ring connecting 8-32 PEs, BW - 8M packets/s (lGB/s); next level "SE:1" one-way ring connects :::;34 SE:Os, BW - 8, 16, or 32M packets/s (1-4GB/s); connection of PE/SE to the ring is done via a directory of the memory in that PE/SE. Memory Organization: Distributed, single address space (:::;lTB), with each localized unit acting as a cache ("ALLCACHE"); 32MB/PE. Performance: Peak: 20MIPS + 40MFLOPS/PE; tested: 6.6MFLOPS/PE on Livermore Loops; 15MFLOPS/PE on LIOO, 28MFLOPS on FFT ; 32MFLOPS/PE on Matrix Multiply. Period: 2/1992 - Announced; 1H92 - beta site. References: [42] [71J [156] Appendix. Information About the Systems 203

Kyushu University Reconfigurable Parallel Processor (KRPP): (A• Kyushu U.) Applications: General. Control: MIMD. N umber of PEs: 128. Type of PE: SPARC MB86900/10 + Weitek WTL1164/67 FP coprocessor or Abacus 3170 FPU, 60ns CPo Interconnection Network: One or more crossbars, SW controlled, PE-PE and PE-MM: 256 8 x 8 crossbar switches, byte-wide data paths at lOMB/s, bidirectional + buses along row & columns of switches. Allows broadcasting. Memory Organization: Local or distributed (Reconfigurable), 8MB/PE. 4GB address space: lower 2GB - private, upper 2GB• shared, implemented as 2568MB Shared Memory Windows; each page - private or in an SMW. Performance: lOMIPS/PE peak, 205MFLOPS (32-bit), 141MFLOPS (64-bit) measured on L300. Period: 1989 (fall) - 128-PE machine. References: [140J [300J

Landmark II: (C - WIPRO (India)) Applications: General. Control: MIMD. Number of PEs: :56. Type of PE: i80386. Interconnection Network: Multibus II. Memory Organization: Global. Period: 1989 - announced. References: [46]

LAP: (NL + C - National Physics Laboratory + British Robotics Systems) Applications: Image Processing. Control: SIMD. Number of PEs: :5256. Type of PE: Custom, I-bit. Interconnection Network: Linear. Memory Organization: Local,256-bit/PE. Host: Yes + Controller. Period: 1982 - prototype. References: [345] 204 Appendix. Information About the Systems

LAU: (NL - CERT (France)) Applications: General. Control: Dataflow. N umber of PEs: 32. Type of PE: AMD Am2900-based 16-bit processor. Interconnection Network: Multiple buses to memory; message passing. Memory Organization: Global. Period: 1979 - first prototype; 1982 - full prototype. References: [8] [427J [445] [449]

LDF-IOO: (C - Loral Instrumentation) Applications: General (Data Analysis). Control: Dataflow. N umber of PEs: 1-256. Type of PE: NS32016. Interconnection Network: Two buses: 32-bit tokens on FLObus; messages (::;16KB) on LDFbus (devices, host); nearest-neighbor messages: latency: 1ms, BW: ~1.2p,s/byte; to host - 8 times slower. Memory Organization: Local, 128KB/PE. Performance: ~lMIPS/PE. Host: Yes. Period: 1985 - available commercially. References: [108J [214J [259]

LEMUR: (NL - Argonne NL) Applications: General. Control: MIMD. Number of PEs: 8. Type of PE: NS32016 + NS32081 FP unit. Interconnection Network: 4-nearest-neighbor PE-MM connections only. Memory Organization: Global, 1MB/MM (8MMs) + Local, 128KB/PE. Performance: 3.3MIPS. Host: VAX-11/708. Period: 1983 - both machines built. References: [80J Appendix. Information About the Systems 205

Leopard-I,ll: (A - U. of Adelaide) Applications: General. Control: MIMD. Number of PEs: ::::::6. Type of PE: Three types: General Data, Special Data, Device; based on NS32032 in Leopard-I, NS32532 + FPO coprocessor on Leopard-II. Interconnection Network: Bus (IEEE Futurebus on Leopard-II). Memory Organization: Global: 2MB on Leopard-I, 16MB on Leopard-II + local (1MB on Leopard-I, for bootstrapping & kernel). Period: 1987 - Leopard-I (3 GD processors). References: [24]

LINKS-I: (A - Osaka U.) Applications: Graphics. Control: MIMD. Number of PEs: 64 + "Root Computer." Type of PE: Control Unit (Z8001), Arithmetic Processing Unit (i8086/87), Memory Unit & Interprocessor Memory Switching Unit. Interconnection Network: Bus. Memory Organization: Local, 1MB in Memory Unit, 256KB in Arithmetic Processing Unit. Host: Yes. Period: Built around 1983. References: [307] [313]

LSM: (CR - IBM Los Gatos) Applications: Logic Simulation. Control: MIMD. Number of PEs: 64 (1 Control Processor, X Logic Processors, 63-X Array Processors). Type of PE: Custom, I-bit logic unit. Interconnection Network: Crosspoint, 64x 64. Memory Organization: Local, lKWx80-bit instruction memory; 5 copies of 2 banks of 2KWx2-bit data memory; lKWx64-bit function memory; lKWxl6-bit delayvalue memory; array processors have an additional 512KWx36-bit RAM simulation memory. Performance: 640M Logic Expressions per second. Host: Yes + Interface Computer. Period: 1980 - 2 prototypes. References: [70] [189] 206 Appendix. Information About the Systems

LUCAS: {A - Lund U. (Sweden)) Applications: General. Control: SIMD. Number of PEs: 128. Type of PE: Bit-serial. Interconnection Network: Shuffle/Exchange, I-bit data paths. Memory Organization: Local, 4Kbit. Host: Master Processor, Z80. Period: 1982 - fully implemented. References: [124}

M3: {A - Swiss Federal Institute of Technology (ETH)) Applications: General. Control: MIMD. Number of PEs: 9. Type of PE: MC68000. Interconnection Network: Hierarchical buses: 1st level- PE-MEM bus; 2nd level: cluster bus; 3rd level: intercluster bus. Memory Organization: Shared, within each level + local, 1MB /PE. Period: 1988 - prototype built. References: [72}

M5PS: {A - Aachen Technical U. (Germany)) Applications: General. Control: MIMD. Number of PEs: Clusters of 8 PEs. Type of PE: Z80 (8-bit). Interconnection Network: Hierarchical buses: node bus, cluster bus, intercluster bus. Memory Organization: Global (shared by PEs on the same bus) + local. Shared variables (spinlock, semaphores) supported within a cluster; message passing (:5140 bytes) supported between clusters. Period: 1984 - prototype. References: [287} Appendix. Information About the Systems 207

M-1 Cellprocessor: (C - Cellware Ltd. (Hungary)) Applications: Image Processing. Control: MIMD. Number of PEs: 256 (16xI6). Type of PE: Custom, 4-bit, lOOns CPo Interconnection Network: 2-D array. Memory Organization: Local, 4Kx4-bit/PE Transition Function Memory. Host: PC-AT + Controller which chooses one of 16 Transition Functions to be used in the following cycle. Period: Built by 1990. References: /477}

M-1800/ab (a = No. of CPUs, b = "0" denotes <1GB main memory, "5" denotes ::;2GB): (C - Fujitsu) Applications: General. Control: MIMD. N umber of PEs: ::;8. Type of PE: Custom, ECL technology + 128KB Buffer Storage. Interconnection Network: Via shared memory. Memory Organization: Global, ::;2GB main storage (40ns cycle) + ::;8GB System Storage Unit (lOOns cycle). Performance: ~300 MIPS/8 PEs (estimated peak). Period: 8/1990 - announced; 1991 - available commercially. References: [369]

M64 Series ("MAX"): (C - FPS Computing) Applications: General. Control: SIMD (-like). Number of PEs: 3-31 (M64/145); ::;15 (M64/140); 2PEs/"MAX" board. Type of PE: Custom, 182ns CP, 2 operations/CPo Interconnection Network: Bus. Memory Organization: Global, IMW (8MB) + Local; Total: ::;29 boards, IMW(8MB)/board. Performance: Peak: llMFLOPS/PE; measured (31 PEs): 9MFLOPS on LI00; 25.7MFLOPS on L300; 101MFLOPS on LlOOO. Host: DEC or IBM. Period: 1985 - M64/140 available commercially; 1988 - M64/145. References: [368] [375] 208 Appendix. Information About the Systems

MACSYM: (A - Kyoto U.) Applications: Image Processing. Control: MIMD. N umber of PEs: 1 Master + :::; 16 Slaves. Type of PE: Zilog Z-8001, 16-bit, 250ns CPo Interconnection Network: Hierarchical buses: common - 16-bit data, 32-bit address + local (PE) bus. Memory Organization: Global, 14Mbit + local (128KB/PE). Host: Master Z-8001. Period: 1982/83 - 4-PE prototype. References: [200J [222]

MAGNUM: {C - HCL (India)) Applications: General. Control: MIMD. Number of PEs: :::;6. Type of PE: MC68030. Interconnection Network: VMEbus + Memory bus. Memory Organization: Global. Period: 1989 - announced. References: [46]

Makbilan: {A - Hebrew U. (Israel)) Applications: General. Control: MIMD. Number of PEs: :::;16. Type of PE: Intel PSBC386/120 with i80386 + i80387, 50ns CP, 32-bit + message-passing coprocessor. Interconnection Network: Bus (Multibus II), 32-bit wide, 25ns cycle (160MB/s). Memory Organization: Distributed, 4MB /PE. Performance: Peak: :::;lOMIPS/PE, sustained: 4-5MIPS/PE (VAX MIPS). Host: Sun 3/50 + Controller: iPSBC386/100 Unix host, 62.5ns CP, 4MB RAM. Period: 1989 - prototype. References: [121J [343] Appendix. Information About the Systems 209

Manchester Data Flow Computer: (A - U. of Manchester) Applications: General. Control: Dataflow. Number of PEs: :::;20. Type of PE: 20 Functional Units per PE, based on AMD Am-2903 (bit-slice) + pipelined vector support; 24-bit, 200ns CP, 5-50 cycles per instruction. Interconnection Network: Ring ("Conveyor Belt"), token passing, 96-bit tokens, 300ns insertion time (a multistaged IN planned). Memory Organization: Global (structure store), 0.5MW (37-bit), 0.75M reads/ + 0.375M writes/s & Local, 32K tokens (384KB) in token queue, 1.25M tokens (15MB) in matching unit. Performance: Measured: 1-2MIPS/PE (0.27MFLOPS/FU peak, 0.172MFLOPS/FU measured). Host: VAX-ll/780. Period: 1981 - prototype. References: [8] [159j [445] [449]

MANJI-II: (A + C - Keio U. + Fuji Xerox) Applications: Production System. Control: Dataflow. Number of PEs: 32: 1 Action Processor + 31 Matching Processors. Type of PE: Custom microprocessor. Interconnection Network: Bus, token passing. Memory Organization: Local + shared message buffer. Period: 1989 - prototype. References: [290j

MAPLE: (CR - Fujitsu Laboratories Ltd.) Applications: CAD Applications: Routing, Layout. Control: SIMD. Number of PEs: :::;64K (16 units x 16 boards x 8 chips x 32 PEs). Type of PE: Fujitsu C-100KAV ("Sea of gates") Array chips =32 bit-serial PEs - ALU + 512-bit data register, 50ns CPo Interconnection Network: 4-nearest-neighbor mesh. Memory Organization: Local, 32Kbit. Performance: 40 GOPS (32-bit integer). Host: Yes + Controller: Am29000 + 28MB memory. Period: 4K-PE prototype built by 1990. References: [388j 210 Appendix. Information About the Systems

MaRS: (NL - CERT (France)) Applications: Purely Functional Languages. Control: Demand Driven. Number of PEs: Several modules. Each includes: 16 Reduction Processors (RPs), 64 Communication Processors (CPs), 12 Memory Processors (MPs), 3 Arithmetic Processors (APs), 1 I/O Processor (IOP). Type of PE: Custom (all). Interconnection Network: Two n subnets: one for memory load and one for reduction loadj each made up of 4 stages of 2 x 2 switches (CPs). Memory Organization: Global, 1Mcell x 38-bit/MP. Host: Yes. Period: 1989 - prototype. References: [85)

MARS-M: (NL - Computer Division of the Siberian Department of the USSR Academy of Science) Applications: Numeric. Control: MIMD. N umber of PEs: 8. Type of PE: Custom: 4 Execution processors + 4 Address processors, 48-bit. Interconnection Network: Fully connected via Asynchronous Channels. Memory Organization: Shared. Performance: Peak: 20MFLOPS & 27.75MIPS. Host: Fits into an El'brus-2 as a Special Processor. Period: 1988 - prototype. References: !4 70}

Masscomp 5700, 6700: (C - Masscomp) Applications: General. Control: MIMD. Number of PEs: 1-4 (5700), 1-5 (6700). Type of PE: MC68020 + MC68881 FPA (5700), MC68030 + FP coprocessor + FPA (optional) + 4 vector accelerators (6700). Interconnection Network: Multifiow buses. Memory Organization: Global. Performance: 5700: 2.5MIPS, 13MFLOPS/PEj 6700: lOMIPS, .56MFLOPS/PE. Period: Available commercially since around 1985. References: [194] Appendix. Information About the Systems 211

MATP Real Time III: (C - Data West) Applications: General. Control: Multiple Vector Processors. N umber of PEs: 1-4. Type of PE: Custom, pipelined, lOOns CPo Interconnection Network: Via multiported memory. Memory Organization: Global, 64KB. Performance: Peak: 120MFLOPS - 80M FP Additions/s + 40M FP Multiplications/s. Host: Yes. Period: 1977/78 - available commercially. References: [436J

MC860VS: (C - Mercury Computer Systems) Applications: General. Control: MIMD. Number of PEs: ~32, 4 PEs/board - 9U VME card. Type of PE: i860, 25ns CPo Interconnection Network: Crossbar, 480MB/s, 64-bit paths, 6 ports/board: one to each PE, one to interboard bus (160MB/s), one to peripheral bus (160MB/s). Memory Organization: Distributed, shared, 2-64MB /board (also message passing). Performance: Peak: 80MFLOPS/PE. Period: 7/1991 - available commercially. References: [340J {468J

Megaframe Supercluster - 64 & 256: (C - Parsytec) Applications: General. Control: MIMD. NUlIlber of PEs: 64 or 256: 4 or 4x4 Computing Clusters, each 4 modules of 4 PEs. Type of PE: T800 or T801 Transputer. Interconnection Network: Hierarchy of 96 x 96 switches, controlled by 2 Transputers (Network Configuring Units), bottom level connects 16 PEs; n + 1 NCU connects 4n NCUs; 1.8MB/s per link, message passing. Memory Organization: Local, 1-4MB/PE; available at 5GB/s (T800 on chip), 1GB/s (T800 external); 7GB/s (T801 on chip), 4GB/s (T801 external). Performance: Peak: 10MIPS, 1.5MFLOPS/T800; 15MIPS, 2.25MFLOPS/T801. Sustained: 1. 15MFLOPS/T800; 0.42MFLOPS/T800 on L100. Host: Sun or similar. Period: 1988 - available commercially. References: [237J [238J 212 Appendix. Information About the Systems

Melbourne Decoupled Multicomputer Architecture: (A - University of Melbourne (Australia)) Applications: General. Control: MIMD. Number of PEs: 3 on current prototype. Type of PE: AMD29050 RISC, 50ns CPo Interconnection Network: Optical Crossbar using FDDI chips and bit-serial links, 100Mbit/s per link. Memory Organization: Local. Period: 1992 - 3-PE prototype. References: [344J

MIDAS: (A - UC Berkeley) Applications: General (Data Analysis). Control: MIMD. Number of PEs: 1 primary, ~8 secondary computers with 1 Modular Processing Array (MPA): 16 PEs + I/O processor each. Type of PE: ModComp 7870, 64-bit FP with internal concurrency (MPA Element). Interconnection Network: In MPA - limited crossbar to memory (every 8 PEs are multiplexed into one line). Memory Organization: Global: 24MMx256KB per MPA + 32MB system memory. Performance: 0.3MFLOPS/PE in the MPA. Period: 1983 - prototype: 1 primary, 1 secondary + MPA. References: [182] [183] [271J

Minerva: (A - Stanford U.) Applications: General. Control: MIMD. Number of PEs: 12. Type of PE: 8: i8080, 8-bit + 4: i3000, 32-bit. Interconnection Network: Bus (IDBUS), demand multiplexed. Memory Organization: Global: 16KW (32-bit) DRAM, 300ns access + lKW (32-bit) ROM, 1.5j.Ls access + local: 4KB/i8080, lKW (32-bit)/ i3000. Period: 1976 - prototype. References: [182] [183] {467J Appendix. Information About the Systems 213

MIT J-Machine: (A + C - MIT + Intel Corp.) Applications: General. Control: Message-driven activation of stored code (Object Oriented). Number of PEs: ~64K. Type of PE: Custom Message Driven Processor (MDP). Interconnection Network: 3D Mesh, deterministic e-cube routing with virtual channels. Memory Organization: Local, 1MB /PE on 64 PEs prototype. Period: 7/1991 - 64-PE prototype using the initial ("A-Step") PEs; 4/1992 - 64-PE prototype using fixed ("B-Step") PEs; 11/1992 - 512-PE prototype. References: [93, 94, 309]

MITE: (CR - IBM Yorktown Heights) Applications: Cytocomputer (Image Processing). Control: MIMD. Number of PEs: 512 + 64 enumerators: 8 cages x (1-8 groups x (8PEs + 1 Boolean Combinator) + 8 enumerators [optional]). Type of PE: Custom, I-bit Look-Up Tables (LUTs). Interconnection Network: Via Boolean Combinator (smart crosspoint) in group, multibus in cage, linear among cages. Memory Organization: Local, 16Kbit/PE. Performance: 2.5G neighborhood operations/s for fully populated system. Host: IBM-PC/XT, connected via a multibus to a control cage + 256KB RAM. Period: 1985 - prototype. References: [224]

MODULOR: (A - ONERA-CERT (France)) Applications: General. Control: MIMD. Number of PEs: 2:80: 4 modules of 20 PEs. Type of PE: INMOS T800 Transputer. Interconnection Network: Hierarchy of crossbars: 4 within module, 4 among modules, realized using INMOS C004 switching elements, controlled by 2xT414. Memory Organization: Local. Period: Built by 1992. References: [95] 214 Appendix. Information About the Systems

Monsoon: (A - MIT) Applications: General. Control: Dataflow with I-structures. Number of PEs: S;8. Type of PE: Custom, pipelined PE; Motorola-fabricated PEs planned for 16-PE model. Interconnection Network: Multistaged, packet switching between PEs and to I-structures, 4-bit serial links, 2 ports/PE, 5ns cycle, 100MB/s per port. Memory Organization: Shared I-structures + local token queues, 128KWx72-bit Frame Store + 128KWx32-bit instruction store. Performance: 6M token/s peak, 3M token/s average; 10M token/s planned for 16-PE machine. Host: TI Explorer LISP Machine. Period: 10/88 - 1-PE prototype; 1991 - 8 PEs. References: [316]

MP-I, MP-2: (C - MasPar Computer) Applications: General. Control: SIMD. Number of PEs: 1K-16K (1-16 boards x 32 chips x 2 clusters x 16 PEs). Type of PE: Custom, 4-bit (MP-1); 32-bit RISC (MP-2), with FP (32 & 64-bit), 70ns CPo Interconnection Network: 8-nearest-neighbor torus ("X-network"), I-bit wide + hierarchical, 3-level, circuit switched crossbar for global random communication, BW: 21GB/s on MP-1; 40MB/s per 32 PEs (chip) on MP-2 + 2 global buses: 1 for controller broadcast, 1 for logical 0 R tree. Memory Organization: Local, S;64KB/PE: on-chip PMEMs (16/cluster, 12GB/s total BW) and off-chip (16KB/PE) + S;256MB I/O RAM. Performance: Peak: MP-1 - 30 GIPS/16K PEs (32-bit integer +); 1.5 GFLOPS/16K PEs (32-bit); 650 MFLOPS (64-bit); MP-2 - 133MIPS, 12.3MFLOPS/chip (32 PEs), 21GFLOPS tested on 4K PEs. Host: VAXstation 3520 with Ultrics or DECstation 5000 model 200 RISC + Array Control Unit (runs the PE array + independent computing power). Period: 1/1990 - available commercially; 10/1992 - MP-2 announced, 4K-PE model installed at Ames Lab., Iowa State U. References: [56] [156] [184] [278} [306} Appendix. Information About the Systems 215

MPH (Hybrid Parallel Machine): (A - Federal U. of Rio de Janeiro) Applications: General. Control: MIMD. N umber of PEs: ::;32. Type of PE: MC6809, 8-bit. Interconnection Network: Hypercube, links via dual ported memory segments (shared memory & message passing). Memory Organization: Distributed, each segment (256B - 4KB) shared by 2 PEs + Local, ::;24KB /PE. Period: 1989 - 16 PEs. References: [122}

MPP: (C - Goodyear) Applications: Image Processing. Control: SIMD. Number of PEs: 16K (128 x 128). Type of PE: Custom bit-serial, lOOns CPo Interconnection Network: 2-D rectangular, reconfigurable into grid, cylinders, torus, spirals (offset cylinders), cycles (end-connected spirals). Memory Organization: Local, lKbit/PE arranged in 128 x 128 I-bit planes (1024 such). Plane transfer - 1 cycle - lOOns (20GB/s total). Plane staging - 128 cycles (160MB/s). Performance: 6.5GIPS, 400MFLOPS on 8-bit additions/subtractions. Host: VAX-ll/780 + Program & Data Management Unit (PDP-ll/34A). Period: 1983 - 1 built. References: [7] [8] [36} [194] [195] [208} [284] [292] [345] [443]

MPS: (A - Beijing Polytechnic U.) Applications: General. Control: MIMD. N umber of PEs: 16. Type of PE: i8088. Interconnection Network: Ring, 24-bit, message passing: 8-bit data, 4-bit source + 4-bit destination addresses & 8-bit control. Memory Organization: Local, ::;256KB /PE CMOS RAM. Host: PE 16 is a PC/AT host. Period: Built by 1990. References: [478} 216 Appendix. Information About the Systems

MU6V: (A - Manchester U.) Applications: General. Control: Multiple Vector Processors. Number of PEs: ~16. Type of PE: Custom CPU + vector processor. Interconnection Network: Bus ("Common Communication Highway"), packet passing. Memory Organization: Distributed. Period: 1984 - 3 PEs (MC68000) with emulated vector processor. References: [197}

MultiMicro: (A - IISC (India)) Applications: General. Control: MIMD. Number of PEs: 7. Type of PE: i80286/80287. Interconnection Network: Bus: Memory Organization: Global, 2MB. Host: Yes. Period: Built by 1989. References: [46]

Multi-PSI: (NL - ICOT (Japan)) Applications: Reduction/Inference. Control: Demand-Driven. N umber of PEs: ~64. Type of PE: Custom "PSI-II." Interconnection Network: 2-D mesh. Memory Organization: Local. Performance: ~130K append reductions/so Period: 1990 - prototype. References: [141}

MULTITOP: (NL - Max Plank Plasma Physics Institute) Applications: Nuclear Fusion Experiments Control. Control: MIMD. Number of PEs: ~33. Type of PE: Transputers, serving as Master CPU or Computing CPU. Interconnection Network: Variable Topology + Bus. Memory Organization: Local, 0.5 MB (SRAM) - 4 MB (DRAM) /PE. Host: UNIX Workstation via VMEbus. Period: 1988 - first prototype; 1991 - 33-PE full controller. References: [333} Appendix. Information About the Systems 217

MUMS: (A - U. of Lund (Sweden)) Applications: General. Control: MIMD. Number of PEs: 38. Type of PE: 2xNS32016/PE; one is CPU, lOOns CP; the other - memory access manager. Interconnection Network: Bus, 40 bits, SOns cycle; message passing. Memory Organization: Local, 512KB/PE. Host: /-LVAX, connected to one node. Period: 1985/86 -prototype. References: {413}

MUNAP: (A - Utsunomiya U. (Japan)) Applications: General (Non-numeric). Control: MIMD. Number of PEs: 4. Type of PE: Nanodata QM-1, 16-bit, based on AMD Am2903 bit-slice ALU. Interconnection Network: Multiple buses: 1 1- + 1 O-bus per PE (each 16-bit), connected via a shuffle exchange, which is made up of 4 levels of 16x4-bit exchange & broadcast cells. Memory Organization: Global, 8 MMs x 16KW + local, 4KW /PE. Host: The Microlevel Processor. Period: 1981 - prototype. References: {28}

MUSE: (A - MIT Lincoln Lab.) Applications: Signal Processing. Control: Systolic. N umber of PEs: 32. Type of PE: Custom. Interconnection Network: Linear. Memory Organization: Local. Performance: Peak: 1.7GFLOPS. Period: 1990 - 4-PE testbed operational; full implementation underway. References: {6} 218 Appendix. Information About the Systems

Myrias 4000, SPS-3: (C - Myrias Research) Applications: General. Control: MIMD. Number of PEs: 4K-64K (8 PEs + service processor == board; 16 boards + backplane, service module, communication board == cage). Type of PE: MC68000, then MC68020 + MC68881 + MC68851 MMU; on SPS-3: MC68040. Interconnection Network: Hierarchical buses (board, cage). Memory Organization: Global: 512MB-8GB + local: 128KB. A 4K system - usable BW of 20GB/s. Performance: 1600MFLOPS for full configuration. Period: 1987 - 512-PE prototype; 1988 - available commercially; 1990 - SPS-3; 10/1990 - closed down. References: !44} [261J [35J} [378J [443J

NAS AS9080: (C - NAS) Applications: General. Control: MIMD. Number of PEs: 2 + vector processor. Type of PE: Custom, 30ns CPo Interconnection Network: Via multiported memory & registers. Memory Organization: Global: :$64MB, 320ns cycle. Performance: 20MIPS, 200MFLOPS. Period: 1985 - Available commercially. References: [32J Appendix. Information About the Systems 219 nCUBE/x, nCUBE2/x Series (x = log2 of the number of PEs): (C• nCUBE) Applications: General. Control: MIMD. Number of PEs: 2X : x = 2 -10 on nCUBE, x = 6 - 13 on nCUBE2. Type of PE: Custom; nCUBE: 32-bit + FP coprocessor, 64-bit, lOOns CP (FP x - 2I-£s); NCUBE2: 64-bit, 50ns CPo Interconnection Network: Hypercube, bit-serial channels; nCUBE: 125ns cycle, peak: 8Mbit/s per channel in each direction, sustained 0.2Mbit/s; nCUBE2: special Message Routing circuits; 50ns cycle, 20Mbit/s. Memory Organization: Local, nCUBE: 512KB/PE; nCUBE2: 4-64MB/PE. Performance: nCUBE peak: 1.9MIPS, 0.5MFLOPS (32-bit), 300MFLOPS (64-bit) per PE; tested: (nCUBE/ten) 0.135MFLOPS/PE on LlOO; nCUBE2 peak: 7.5MIPS,3.2MFLOPS (32-bit), 2.4MFLOPS (64-bit) per PE. Host: i80286 + AXIS. Period: 1985 - available commercially; 7/1989 - nCUBE2. References: [8] [145] [171] [194] [208] [215] [259] [315] [379] [380] [389] [443]

NERV: (A - U. of Heidelberg (Germany)) Applications: Neural Network Simulation. Control: MIMD. Number of PEs: ::;320 (20 boards x ::;16PEs + Master PE - VME board). Type of PE: MC68020, 40ns CPo Interconnection Network: VMEbus + global MAX finder + broadcast bus. Memory Organization: Local, 1MB /PE. Performance: ::;1300 MIPS == 13K neurons. Host: Mac II + controller (Master - MC68020 + 4MB). Period: By 1989 -limited prototype (50ns CP, 512KB/PE). References: [267] 220 Appendix. Information About the Systems

Non-Von-l,3,4: (A - Columbia U.) Applications: Reduction/ Inference. Control: Multiple SIMD. N umber of PEs: 1M Small Processing Elements (SPEs) on Primary Processing System (PPS); 511-1023 Large Processing Elements (LPEs) for 256K-l SPEs on Non-Von-4. Type of PE: I-bit SPEs (NV-l), 8-bit SPEs (NV-3), MC68000 or NS16032 LPEs (NV-4). Interconnection Network: Binary tree among SPEs; Banyan/O network among LPEs - circuit switched, message passing. Memory Organization: Local: 64B/SPE (NV-l,3), 256B/SPE, 256KB-IMB/LPE (NV-4). Performance: LPE: 3 MIPS. Host: 1 control processor on NV-l,3, VAX-11/750 on NV-4. Period: 1983 - 3-SPE prototype (NV-l). References: [7] [8] [48] [251] [387J [444]

Norsk-Data ND-5900 Series: (C - Norsk-Data) Applications: General. Control: MIMD. Number of PEs: 2-4. Type of PE: Custom. Interconnection Network: Bus. Memory Organization: Global,20-512MB. Performance: 6.5MIPS/PE (Whetstone, unoptimized FORTRAN). Period: 1984 - available commercially. References: [208J

NP-l: (C - Gould) Applications: General. Control: Multiple Vector Processors. Number of PEs: 1-8 PEs, in pairs. Type of PE: Custom, 52ns CP, scalar + Arithmetic Accelerator (opt.) + 16KB I-cache + 16KB D-cache. Interconnection Network: Bus ("System bus") for each pair of PEs, 52ns cycle, 64-bit data + control, 154MB/s BW; full interconnect among pairs via Inter-System Bus Links (ISBLs). Memory Organization: Distributed shared, 16-64MB/PE,4GB address space, 1GB / System Bus. Performance: 3.3MFLOPS/PE (64-bit), 5.1MFLOPS/PE (32-bit) on L100, with accelerator. Period: 9/1987 - available commercially. References: [370] {451J Appendix. Information About the Systems 221

OMEN-60 Series: (C - Sanders Associates) Applications: Signal Processing (General). Control: SIMD. Number of PEs: 64 Vertical Arithmetic Units (VAUs) & 1 Horizontal Arithmetic Unit (HAU). Type of PE: Custom in VAU: I-bit (OMEN 61 & 62), 16-bit (OMEN 63 & 63); PDP-l1 in HAU. Interconnection Network: Linear, with some other communication modes available through skew memory; message passing. Memory Organization: Global Orthogonal Memory 8~ 128KW (16-bit), accessible either horizontally (by word), at 2MW Is, or vertically (bit-slice) at 45MWIs. Period: Early 1970 - available commercially. References: [3S1}

OSCAR: (A - Waseda U. (Japan)) Applications: Solution of Sparse Linear Equations. Control: MIMD. Number of PEs: ::;128: ::;8 clusters x 16 PEs. Type of PE: Custom, RISC, 32-bit, 200ns CPo Interconnection Network: 3 buses within each cluster, 32-bit, 20MB/s BW, connected to PE via 2KW (32-bit) DPR. Memory Organization: Distributed shared: 3 MMs/cluster (accessible from all three busses) + Local, 256KW (32-bit) data memory+ 2 x 128KW instruction memory. Performance: 5MFLOPS/PE. Host: Unix-based workstation. Period: 1988 - 1 cluster built. References: [216} [217}

OUPPI-l: (A - U. of Provence (France)) Applications: Physical Systems Simulation. Control: SIMD. Number of PEs: 144 (2 chips x 72 PEs). Type of PE: GAPP chips, bit-serial, lOOns CP (5 instructions/cycle). Interconnection Network: Torus. Memory Organization: Local, 128-bit/PE. Host: Yes + Am291OA-based controller. Period: 1988 - prototype built. References: [321} 222 Appendix. Information About the Systems

PACE: (NL + C - Advanced Numerical Research and Analysis Group (ANURAG) + Electronic Corporation ofIndia Ltd. (ECIL)) Applications: Computational Fluid Dynamics. Control: MIMD. Number of PEs: ::=;128. Type of PE: MC68020 + Weitek 1164/65 (initially), custom coprocessors (eventually). Interconnection Network: Hypercube. Memory Organization: Local. Period: IH1989 - 4-PE prototype; 2H1991 - 128-PEs. References: [46]

PADMAVATI: (A + CR - CSELT Laboratories, Turin Polytechnic & Turin U. (Italy) + GEC Hirst Research Centre (UK)) Applications: Artificial Intelligence/Speech Recognition. Control: MIMD. Number of PEs: 16. Type of PE: T424 or T800 Transputer, 50ns CP (T800). Interconnection Network: Delta Network (multistaged), packet switching with cut-through (wormhole) routing, 2.5MB/s per port. Memory Organization: Distributed, via message passing to "owner" PE, 8-16MB/PE + CAM boards (54 x 148W x 32-bit). Period: 1987/88 - prototype. References: [23} [275}

PAPIA: (A - Pavia U. + other Italian universities) Applications: Image Processing. Control: Multiple SIMD. Number of PEs: 21845 (pyramid with a 128x128 base). Type of PE: Custom, I-bit (5 per chip). Interconnection Network: Quadtree Pyramid + 4-nearest-neighbor plane connections. Memory Organization: Local, 256bit/PE. Host: Yes + control units (one for each 2 layers of the pyramid). Period: 1987/88 - prototype. References: [74} [345] Appendix. Information About the Systems 223

Paragon XP /S (Research version: "Touchstone Delta" - "TD"): (CR + C - Intel) Applications: General. Control: MIMD. Number of PEs: ::;4K. (512 compute nodes; 32 disk nodes, 26 service nodes.) Type of PE: i860XP, 20ns CP (2 instructions/cycle) + 16KB I-cache, 16KB D-cache + i860XP for communication handling. Interconnection Network: 2D wormhole-routed mesh, 16 x 36 (4-nearest-neighbor), BW: 200MB/s per link full duplex (25MB/s BW, 80p,s latency on TD). Memory Organization: 16-128MB/PE (16MB/computational PE on TD). Performance: Peak: 75FLOPS (64-bit), 42 MIPS/PE (60MFLOPS/PE on TD); tested: ::;15GFLOPS for 528-PE TD. 13.9GF MP Linpack on order 20K problem Period: 1991 - Touchstone Delta prototype at CalTech: 576 PEs - 33 columns x 16 rows; 33 columns - computational (+ 16MB RAM/PE), 2 columns - I/O (1.4GB disk/PE), 1 column for HiPPi interface (100MB/s each). References: [42J [109J [156J [258)

Paralex "Gemini" & "Pegasus": (C - Paralex Research) Applications: General. Control: MIMD. Number of PEs: 1024 (Gemini), 512 (Pegasus). Type of PE: i80286 (Gemini), SPARC (Pegasus). Interconnection Network: Hypercube. Memory Organization: Local: 16MB/PE (Pegasus). Performance: 2GIPS + 500MFLOPS (Gemini), 25GIPS + 15GFLOPS (Pegasus). Host: Unix front end. Period: 1985/86 - Gemini, 1988/89 - Pegasus. References: [261J

PARK: (A - Kobe U. (Japan)) Applications: Reduction. Control: MIMD. Number of PEs: ::;16 (1 host + 15 slaves). Type of PE: MC68000, 125ns CP, 16-bit + address translating unit. Interconnection Network: Bus. Memory Organization: Distributed, 512KB/PE (Dual Ported RAM) + local, 256KB on host, 128KB/PE on slaves. Host: Yes, PE number O. Period: 1987/88 - I-master, 3-slave prototype. References: [280} 224 Appendix. Information About the Systems

ParSiFal: (A + C - A team led by Manchester U.) Applications: General. Control: MIMD. Number of PEs: 1-2 T-Racks, x 16 cards, 4 PEs each. Type of PE: T800 Thansputer. Interconnection Network: Linear + 2 crossbars. Memory Organization: Local, 1MB /PE. Host: Sun-3/160, via VMEbus. Period: 1986/87 - prototype. References: [191}

Parsytec GC and GCel: (C - Parsytec (Germany» Applications: General. Control: MIMD. Number of PEs: 64 to 64536 (up to 4096 for GCel). Type of PE: Inmos T9000, 20ns CP for GC; Inmos T805, 33.3ns CP for GCel. Interconnection Network: 3-D network of clusters of 16 PEs ("Mesh of Clos Networks"). Nonblocking, wormhole-routing. HW-partitioning. 160 MB/S per link of grid of clusters; 80 MB/s per PE. Able to mask out failed nodes through redundancy. Thee network for I/O. (2-D network of clusters of 16 PEs for GCel.) Memory Organization: Local, 8-32MB/PE. (4MB for GCel.) Performance: Peak: 1.6 GFLOPS - 1.6 TFLOPS; average: 30 GFLOPS/4K nodes. Period: GC commercially available 1993, upgrades GCel. (Largest delivery of GCel by 10/92 contained 1024 nodes. First GCel, 1990.)

Parwell-l: (C - pI (Germany» Applications: General. Control: MIMD. Number of PEs: 37: 1 Master, 4 Submasters, 4 clusters x 8 slaves. Type of PE: MC68020/MC68881. Interconnection Network: Hierarchical buses, 32-bit: one within each cluster (8 slaves + Submaster), one among clusters (4 Submasters + 1 Master), message passing. Memory Organization: Hierarchically accessed distributed (shared): PE accesses own + subordinate PEs' memory, 4MB/PE, DPR. Host: Apollo WS + Unix. Period: 1988 - available commercially. References: [64} Appendix. Information About the Systems 225

PASM: (A - Purdue D.) Applications: Image Processing. Control: Partitionable SIMD/ MIMD. Number of PEs: 16-1024. Type of PE: MC68010 (later - custom). Interconnection Network: Generalized ("Extra Stage") Cube (rate: 400ns for 16-bit) or Augmented Data Manipulator (circuit switched) + Control buses. Memory Organization: Global: 1 MM per 4 PEs, 64KW each & local: 256KB, expandable to 2MB. Performance: ~0.5MIPS/PE. Host: 1 micro controller per 32 PEs. Period: 1984 - 16-PE prototype (using MC68000). References: [7] [8] [182] [183] [396} [397}

PAX (or PACS) Series: (A + C - D. of Tsukuba + Mitsui) Applications: General. Control: (Quasi-) MIMD. Number of PEs: 32 (PAX-64J), 8-64 (MiPAX-JFV), 128 (PAX-128), 288 (QCD-PAX). Type of PE: DCJ-ll (64J); DCJ-ll + FPJ-ll FP coprocessor (MiPAX), MC68BOO (500ns CP) + AM9511 (250ns CP) FP coprocessor (PAX-128); MC68020 (40ns CP) + L61433 FPD (QCD-PAX). Interconnection Network: Torus + bus to host, broadcast available through bus adapter, message passing. Memory Organization: Distributed: 136KB/PE (64J); 264KB/PE (MiPAX), 320KB/PE (128); 4MB/PE, lOOns cycle + 2MB/FPD 35ns cycle (QCD) + Communication memory, interspersed with PEs. Performance: 3.2MFLOPS (64J); 2.5MFLOPS/8 PEs (MiPAX); 4MFLOPS/PE (128); Peak: 32MFLOPSjPE , tested: 16MFLOPSjPE (QCD). Host: TI990/20; Sun-3/260 (QCD). Period: 1980 - PACS-32; 1982/93 - PAX-64J, MiPAX; 1984 - PAX-128; 1988 - 4-PE QCD; 1990 - 288-PE QCD. References: [8] [1SS} [301] [313] [392} 226 Appendix. Information About the Systems

PCLIP (Prototype Pyramid Machine): (A - U. of Washington) Applications: Image Processing. Control: SIMD. Number of PEs: 85 (8x8 + 4x4 + 2x2 + 1). Type of PE: Custom, I-bit. Interconnection Network: Quadtree Pyramid + 8-nearest-neighbor plane connections. Memory Organization: Local, 8Kbit/PE off-chip + 3bit/PE on-chip; base level memory accessible to the host. Host: IBM PC-AT + Pyramid Controller. Period: 1987 - prototype. References: {491}

PEPE: (C - Burroughs) Applications: Signal Processing. Control: SIMD. Number of PEs: ::;288 (::;8 bays of 36 PEs). Type of PE: Custom, 32-bit, FP processing, made up of 3 separately controlled subunits: Arithmetic Unit, Correlation Control Unit, Associative Output Unit. Interconnection Network: Linear + Through Control Unit. Memory Organization: Global buffer, 32Kx32-bit + local, lKx32-bit. Performance: ~lOOMFLOPS. Host: CDC 7600. Period: 1970 - I-bay (36 PEs) prototype delivered; 1974 - product. References: [123] [279} [418] {452}

PICAP II: (A - Linkopig U. (Sweden» Applications: Image Processing. Control: MIMD/MSIMD (FIP - SIMD). Number of PEs: ::;16. Type of PE: Various types: video input, display, graphical overlays, filter, logical neighborhood, segm~ntation; FIlter Processor (FIP): 4-PE pipeline. Interconnection Network: Bus, 32-bit wide, lOOns cycle, 40MB/s BW. Memory Organization: Shared,4MB (16 MMs x 256KB) + local in FIP (32KB). Performance: FIP peak: 100MIPS (8-bit). Host: Yes - PDP or SEL 77/35. Period: 1982 - operational. References: [295} [296} Appendix. Information About the Systems 227

PIM-D: (NL + C - ICOT + OKI Electric Industry) Applications: Reduction/Inference. Control: Dataflow. Number of PEs: 16 (100 planned for PIM). Type of PE: Custom, using AMD Am2900 series processors. Interconnection Network: Hierarchical bus network. Memory Organization: Local, 15 Structure Memories. Period: 1986 - prototype. References: [48] [154]

PIM-R: (NL - ICOT) Applications: Reduction/Inference. Control: Demand Driven. Number of PEs: 16 (100 planned for PIM). Type of PE: MC68000. Interconnection Network: Torus. Memory Organization: Global & local. Period: 1986 - prototype. References: [48] [154] [444]

PIP (Programmable Image Processor): (A - MIT Lincoln Lab.) Applications: Image Processing. Control: SIMD. Number of PEs: 16 (eventually - 64). Type of PE: Custom, 32-bit. Interconnection Network: 2-D array of PEs and MMs, joined by 5 vertical & 4 horizontal buses with 4x5 switches. PEs interspersed with vertical buses along horizontal buses & MMs interspersed with horizontal buses along vertical buses. Memory Organization: Shared, 25 MMs x 2KB SRAM. Performance: 10 MIPSjPE. Period: 16-PE & 25-MM system built by 1990. References: [45J

PIPE: (NL + C - Bureau of Standards + Digital/Analog Design Associates (Aspex)) Applications: Image Processing. Control: MIMD. Number of PEs: ~8 Modular Processing Stages (MPSs). Type of PE: Custom, 8-bit. Interconnection Network: Linear + Feedback. Memory Organization: Local. Host: IBM-PC or compatible. Period: 1986/87 - first prototypes. References: [152J [220J [221J 228 Appendix. Information About the Systems

Pixel-Planes 5: (A - U. of North Carolina at Chapel Hill) Applications: Interactive Rendering of Computer Graphics. Control: MSIMD. Number of PEs: 48 MIMD Processors; 20 SIMD (16K) arrays. Type of PE: Intel i860 (MIMD)j custom (SIMD). Interconnection Network: Ring. Memory Organization: Local memory with message passing. Host: Sun 4/280. Period: 1990 - first prototype; 1992 - second prototype. References: [138]

PIXIE-5000: (C - Applied Intelligent Systems Inc.) Applications: Image Processing. Control: SIMD (Systolic Array). Number of PEs: 1-8 cards, 128 PEs/card. Type of PE: Custom, I-bit. Interconnection Network: Linear. Memory Organization: Local, 8Kbit/PE. Performance: 3.5G neighborhood operations/so Host: Force Computer CPU-IC (MC68000-based, with 512KB RAM). Period: 1985 - available commercially. References: [469]

Pleiades: (A - U. of Kent (UK)) Applications: Knowledge Base. Control: MIMD. N umber of PEs: 1 Master + ::;20 Slaves. Type of PE: MC6800 + MC6820 Peripheral Interface Adapter. Interconnection Network: Crosspoint. Memory Organization: Local, ::;48KB/PE on slaves + 128B stack RAM, 8KB on master. Host: PDP-11/40 + UNIX. Period: 1978 - I-master, 3-slave prototype. References: [82]

PLURIBUS: (C - BBN) Applications: ARPANET Controller. Control: MIMD. N umber of PEs: ::;56. Type of PE: Lockheed SUE, 16-bit. Interconnection Network: Crossbar to memory (PE buses and Memory buses, fully connected). Memory Organization: Global, ::;IMB: 4 MMs x 128KW x 16-bit + local, 4KW/PE. Period: February 1973 - 14-PE prototype. References: [112] [172] Appendix. Information About tbe Systems 229

PLUS: (A - CMU) Applications: General. Control: MIMD. N umber of PEs: ~64. Type of PE: MC88000, 40ns CP + 32KB I-cache, D-cache. Interconnection Network: Mesh of routers, 4-nearest-neighbor + 1 PE connection, 30MB/s each direction (designed at CalTech). Memory Organization: Distributed shared, 8-32MB/PE, in 2 banks, with Global memory mapping, implemented with Xilinx PLD/PALs + Local, 256KB SRAM. Host: Yes. Period: 1989 - one node, 1990 - multiple nodes. References: [54] [55]

Polyp: (A - Heidelberg U.) Applications: General (Data Analysis). Control: MIMD. Number of PEs: :::;~200, one or more per module. Type of PE: MC68000 or MC68020/68881 + 16KB cache (only in modules of more than 1 PE). Interconnection Network: Multiple (~20) buses (number proportional to number of PEs) among modules, local buses within modules. Memory Organization: Shared within module only, 1MB/module. Performance: 75MIPS for 30xMC68000 (lOOns CP) + 2 buses; 400MIPS expected for 48xMC68020 (40ns CP) + 2 buses; 2GIPS expected for 21OxMC68020 + 12 buses. Host: Yes. Period: Built around 1984/85. References: [266J [268J

POMP: (A - Ecole Normale Superieure (France)) Applications: Image Synthesis, Graphics, and Number crunching. Control: SIMD. Number of PEs: :::;256 in parallel processor + 1 in scalar processor. Type of PE: MC88100, 32-bit RISC, 50ns CP + HyperCom communication chip. Interconnection Network: Hybrid (dynamic/static) Multistaged Interconnection Network (multiple indirect binary cube), circuit or packet switched. Memory Organization: Local, 128KW (32-bit) /PE SRAM, 35ns cycle. Performance: Peak: 17 MIPS + 10 MFLOPS/PE. Host: Yes (Sun). Period: 1991/92 - 1-PE prototype. Expected to be comercialized. References: [186J 230 AppencUx. Information About the Systems

POOMA: {CR - Philips Research Labs. (Netherlands)) Applications: Database. Control: MIMD. N umber of PEs: 100. Type of PE: MC68020 + MC68881 + MC68851 MMU on board, 60ns CP, custom communication processor + Ethernet Controller (CMC ENP-lO) via VMEbus + CDC Wren-IV 300MB hard disk at half of the PEs. Interconnection Network: Extended Chordal Ring {chordal ring with connections to 5 neighbors, at distances 1, 8, 9, 13, 18 away; implemented via 5 20x20 crossbar switches; 320MB/s BW - 20Mbit/s each link, message passing. Memory Organization: Local, 16MB/PE (4MB on-board, 12MB extension memory connected via VMEbus). Performance: 2MIPS/PE, O.IMFLOPS/PE (with on-board memory used); 1.3MIPS/PE using extension memory. Host: Yes, via Ethernet. Period: Late 1988/Early 1989 - 100-node machine. References: [310J [454J pps: (NL - Centre for Development of Telematics (C-DOT), Bangalore, India) Applications: Weather Forecasting, Image Processing. Control: MIMD. Number of PEs: ~256. Type of PE: Transputer T800 + MC6801O. Interconnection Network: 256x256 nonblocking crossbar switch, emulating a 512x512 crossbar "essentially nonblocking". Memory Organization: Global, 2 copies x ~256 MMs x 16MW (64-bit) + Local. Performance: Peak: 640 MFLOPS, IGIPS; sustained: 200 MFLOPS. Host: Unix host/controller. Period: 8/1989 - 16-PE, 25MFLOPS prototype; 1990 - work on a 128-PE prototype. References: [46)

Presto: (CR - NTT Communication and Information Laboratories, Japan) Applications: Artificial Intelligence/Production Systems. Control: Dataflow. Number of PEs: ~1O. Type of PE: i80386 + Memory Control Unit + Bus Control Unit. Interconnection Network: Bus, message passing. Memory Organization: Local, 4MB /PE. Host: NEC-PC9801. Period: Built by 1990. References: [223J Appendix. Information About the Systems 231

PRIME: (A - UC Berkeley) Applications: General. Control: MIMD. N umber of PEs: 5. Type of PE: META 4 microprocessor, 16-bit. Interconnection Network: Multiple buses. Memory Organization: Partially Shared, 13 MMs x 2 x 4KW x 32-bit, 550ns cycle, 400ns access. Period: 1972/73 - prototype. References: [35}

Princeton Engine: (CR - David Sarnoff Research Center) Applications: Image and Signal Processing. Control: SIMD. Number of PEs: 64 - 2K PEs, 512PEs/cabinet + controller cabinet. Type of PE: Custom, 71ns CP, 89-bit instruction word. Interconnection Network: Linear Array, 16-bit channel between PEs. Memory Organization: Local, 32-128KB/PE. Performance: 2.5 TOPS (I-bit), peak: 20MIPS/PE, tested: 28GIPS, 1 GFLOPS/2K PEs. Host: Apollo Mentorgraphics or Sun, via Multibus. Period: 1990 - 1024-PE prototype. References: [329}

Pringle: (A - Purdue U. & Washington U.) Applications: General. Control: MIMD. Number of PEs: 64. Type of PE: Intel i8031 (8-bit) + i8231 FPA, 83ns CPo Interconnection Network: Bus - switch polled at 8MHz, 8-bit wide (64MBit/s), controlled by an i8086. Memory Organization: Local, 2KB. Performance: 64MIPS. Period: This testbed for CHiP [410] was built in 1984. References: [8] [182] [183] [213} 232 Appendix. Information About the Systems

PRODIGY: (CR - Toshiba R&D) Applications: General. Control: MIMD. Number of PEs: :5512 (8 backplanes x 8 boards x 8 PEs). Type of PE: SCC 68070, 16-bit, lOOns CP + Router. Interconnection Network: Hypercube: base-8 3-cube (PEs i,j are connected if is, js differ by one bit; 3-D connections, each dimension 8x8 crossbar, 8-bit wide, 5MB/s channel peak, 2.8 MB/s tested). Memory Organization: Local, 2MB DRAM. Period: 1991 - 128 PEs. References: {430J

Production System Machine (PSM): (A - CMU) Applications: Reduction/Production Systems. Control: Demand driven. Number of PEs: 32-64. Type of PE: Custom. Interconnection Network: Multiple buses. Memory Organization: Global & local. Performance: ~2MIPS/PE. Period: A model was created using a 4-PE VAX-ll/784 in mid-1980s; prototype expected in late 1980s. References: [48J [251J

PS-2000, 2100: (NL + C - Scientific Research Institute of Control Computing Systems, Severodonetsk, Ukraine + Severodonetsk Institute Building Plant) Applications: Numeric + General. Control: SIMD (2000); MSIMD (2100). Number of PEs: 2000: 8-64, in blocks of 8; 2100: 64-640 - 1-10 modules of 64 PEs. Type of PE: Custom: 2000 - 24-bit, 320ns CP; 2100: 32-bit, 140ns CPo Interconnection Network: 2000: Linear, parallel bus among PEs + bus, segmentable to variable-size clusters (8, 16, 32 PEs); 2100: similar within modules + crossbar among modules. Memory Organization: Local, 48KB /PE on 2000, 128-512KB /PE on 2100. Performance: 2000 peak: 3.1MIPS/PE; 2100 peak: 2.4 MIPS/PE + O.lMFLOPS/PE; 2100 tested: 5.3MFLOPS (total) on Linpack, 32-bit. Host: 2000: SM-series 16-bit minicomputer; 2100: PS-1001 32-bit minicomputer. Period: PS-2000: in production 1981-1989; PS-2100: 1987- prototype, 1990 - serial production. References: {470J {471J Appendix. Information About the Systems 233

PSM: (A - U. of Tokyo) Applications: Image Processing. Control: MIMD. Number of PEs: 30 + 2 I/O processors (on PSM-32). Type of PE: Custom, pipelined RISC processor. Interconnection Network: 2 x ShufRe Network, using 8x8 & 2x2 connection elements, token-passing, lOOns cycle. Memory Organization: Shared, 32 MMs. Period: PSM-32 constructed in 1990. References: [97J

PUPS (PJLPS): (NL - Los Alamos National Lab.) Applications: General. Control: MIMD. N umber of PEs: 20. Type of PE: i8086/87. Interconnection Network: Reconfigurable: Ring, Tree, Star or Tightly coupled. Memory Organization: Distributed or local (reconfigurable), 32MMs, 16MB. Period: 1984 - hardware running, project cancelled. References: [282]

PX-l: (A - Tokyo Institute of Technology) Applications: Image Processing. Control: MIMD. N umber of PEs: 32. Type of PE: Zilog Z-80. Interconnection Network: Ring bus. Memory Organization: Local, 64KB /PE. Host: Similar to PEs. Period: 1982/83 - prototype. References: [355J 234 Appendix. Information About the Systems

QUEN 16: (A + C - Johns Hopkins Hospital Applied Physics Lab. + Interstate Electronics) Applications: General. Control: SIMD /Systolic. Number of PEs: ~16 (on John Hopkins prototype). Type of PE: 32-bit DSP chip. Interconnection Network: Planar (or linear) array, via DPR. Memory Organization: Local 256W x64-bit + shared DPR (16KW x 32-bit). Performance: 16MFLOPS/PE. Host: VAX/VMS. Period: 1988 - available commercially. References: [371] [372]

R-256: (CR - NTT) Applications: General. Control: MIMD. Number of PEs: 256 (16 x 16). Type of PE: Custom, 80-bit FP VLSI, 30ns CPo Interconnection Network: 16 multistaged, reconfigurable switching units, each for one row and one column. Data sent in one dimension, then the other. Each two units are controlled by a Local Control Unit. LCUs are controlled by a Main Control Unit (MCU) via MCU bus. IN BW; 2GB/s (+ a torus). Memory Organization: Local, 1.25MB /PE. Performance: 500MFLOPS sustained. Host: If the MCU is a workstation, the machine serves as an FPA. Period: Built in 1988/89. References: [139]

RAP: (C - Raytheon) Applications: Signal Processing. Control: SIMD. N umber of PEs: ~4096. Type of PE: Custom, bit-serial. Interconnection Network: Perfect Shuffle + Linear. Memory Organization: Local, lKbit. Host: Sequential Control Unit. Period: 1974 - 64-PE prototype. References: [88] Appendix. Information About the Systems 235

RAP (Ring Array Processor): (A - International Computer Scientific Institute) Applications: Neural Network Calculation, Speech Recognition. Control: SIMD /SPMD /MIMD. Number of PEs: ~64 (~16 boards x 4 PEs/board). Type of PE: TI TMS320C30 DSP chips, 62.5ns CPo Interconnection Network: Ring (Layer Distribution Ring) + VMEpus among boards. Memory Organization: Local, 4-16MB/PE DRAM + 256KB SRAM. Performance: Peak: 32MFLOPS/PE; tested: ~16MFLOPS/PE. Host: MC68020-based host + Sun WS control, via Ethernet. Period: 1990 - 16-PE prototype; 1991- 40-PE prototype. References: [295] [296]

RCA-215: (C - RCA) Applications: General. Control: MIMD. N umber of PEs: 2-4. Type of PE: Custom, 36-bit word (32 - data, 4 - priority); 16-bit fixed point, 32/64-bit floating calculations. Interconnection Network: Crossbar with priority for I/O devices. Memory Organization: Global, 8 MMs x 16-32KW x36-bit, 1.5JLs cycle. Performance: 41OKOPS/PE. Period: Early 1970s - available commercially. References: [112] [334]

RMIT: (A - Royal Melbourne Institute of Technology (Australia)) Applications: General. Control: Dataflow. Number of PEs: ~16. Type of PE: MC68000 or MC68020 + MC68881 FP coprocessor (lor 2/PE, one for matching, one for executing). Interconnection Network: Multistaged network. Memory Organization: Local. Performance: 200K nodes/so Period: 1990 - being built. References: [1] 236 Appendix. Information About the Systems

RP3: (CR - IBM T. J. Watson Research Center) Applications: General. Control: MIMD. Number of PEs: ~512 Processor Memory Elements (PMEs) + 64 I/O Support Processors (ISPs). Type of PE: ROMP: 32-bit, 801-type (RISC) + FP support, 200ns CP + 32KB cache. Interconnection Network: Two networks: n network (combining) & SW-Banyan (low-latency), both PE-MM only, packet or circuit switched; SW-Banyan: BW - 12.8GB/s, latency 320ns for 8B packet; both message passing and shared variables are supported. Memory Organization: Distributed or local (partitionable), 4-8MB/PE. Performance: Peak: 1.3GIPS, 800MFLOPS; sustained: 1GIPS, 500MFLOPS. Period: 64-PE prototype built by 1988. References: [7J [8J [194J [261J [325]

RPA: (A - U. of Southampton) Applications: General. Control: SIMD. Number of PEs: 1024. Type of PE: Custom, I-bit, lOOns CPo Interconnection Network: 4-nearest-neighbor torus, 20ps cycle, bidirectional; 16-bit-wide data paths (100MB/s overall BW). Memory Organization: Local, 64Kbit/PE, available at 1.28MB/s. Performance: 6MFLOPS measured on 32-bit multiplication, 18MFLOPS on 32-bit additions. Host: Yes. Period: 1986/87 - prototype. References: [312]

RST: (A - U. of Pisa (Italy)) Applications: General. Control: MIMD. Number of PEs: ~15. Type of PE: Fairchild/Intergraph Clipper chip, 32-bit, 30ns CP + 2 x 4KB caches + 256KB external cache. Interconnection Network: Bus. Memory Organization: Shared + Local 32KB RAM + 128KB EPROM. Period: Implemented by 1991. References: [328] Appendix. Information About the Systems 237

RTP: (NL - RSRE (UK) + others in UK and France) Applications: General (Image Processing). Control: MIMD. Number of PEs: Several "Supernodes": 16 "worker" PEs + "Manager" PE. Type of PE: Transputers (T414 in prototype, T800 eventually), 66.7ns CP (T414). Interconnection Network: Hierarchy of 72x72 crossbars within node, among nodes (Transputer controlled) + control bus within node, Clos multistaged switching network among nodes. Memory Organization: Local, 128KB (T414) or 256KB (T800) per worker PE, 16MB/manager PE. Host: IBM PC or compatible. Period: 1988 - 16-PE (1 node) prototypes: one with T414 PEs, one with T800 PEs. References: [167} [402]

S-1 (Mark-I,IIA): (NL - Lawrence Livermore National Lab.) Applications: General. Control: Multiple VLIW. N umber of PEs: 1-16. Type of PE: Custom, 996-bit instructions, 64KB data cache + 16KB instruction cache. Interconnection Network: Crossbar, 60-bit wide + bus (The Synchronization Box) for small data transfers. Memory Organization: Global: ~16 MMs, each 256MW (4x9-bit-bytes/W). Performance: Peak: :=:;:j15MIPS/PE, 400MFLOPS/pipe, measured: 74MFLOPS/PE (Mark IIA). Period: 1977 - Mark I, 1984 - Mark IIA. References: [7] [iiB} [182] [183] [466} 238 Appendix. Information About the Systems

Saxpy Matrix-I: (C - Saxpy Computer) Applications: General. Control: SIMD - Systolic Array. Number of PEs: 8-32. Type of PE: Custom, 32-bit + pipelined vector (FP x & +), 64ns CPo Interconnection Network: Linear (right neighbor) systolic data path + data broadcast. Bus ("SaxpyInterconnect") between matrix processor and other units, 320MB/s. Memory Organization: System: 16-128MW, lOOns cycle, 80MW /s BW; Matrix processor buffer: 128KW, 1.5GB/s BW + local: 4KW, 32ns cycle. Performance: Peak: 1GFLOPS. Host: System controller - VAX/VMS (executes the application program). Period: 1987/98 - available commercially; 8/1988 - closed down. References: [131} [259] [443]

SDFM: (CR - Stollman (Germany)) Applications: General. Control: Dataflow. Number of PEs: $17. Type of PE: MC68020. Interconnection Network: Bus (VMEbus). Memory Organization: Distributed, 4MB/PE, DPR. Host: One of the boards runs UNIX V. 3. Period: 1988 - 1 host + 3-node prototype. References: [150}

Sequent Balance 8000 & 21000: (C - Sequent) Applications: General. Control: MIMD. Number of PEs: 2-12 (8000), 4-30 (21000), 2 CPUs/board. Type of PE: NS32032 + FPA, lOOns CP + 8KB cache. Interconnection Network: Three buses: system, lOOns cycle, 26.7MB/s, 32-bit data multiplexed with 28-bit address; System Link & Interrupt Control (SLIC) for synchronization & PE communications; Small Computer System Interface (SCSI). Memory Organization: Global, 2-28MB, 4 modules; access rate: 8 byte/300ns. Performance: 0.7MIPS/PE. Period: 1985 - available commercially. References: [145] [194] [208] [215] [259] [263} [292] {422} [442] [443] Appendix. Information About the Systems 239

Sequent Symmetry, S81a & S81b: (C - Sequent) Applications: General. Control: MIMD. Number of PEs: 2-30 (2 CPUs/board). Type of PE: i80386/80387 + optional FPA (67ns CP) + 64KB cache: Write-through on S81a, Copy-back on S81b. Interconnection Network: Bus, 64-bit data multiplexed with 32-bit addresses, lOOns cycle (80MB/s peak, 53.2MB/s sustainable). Memory Organization: Global: :S240MB. Performance: 3MIPS/PE (S81a), 4MIPS/PE (S81b). Period: 1987 - available commercially. References: [15} [270]

Sequoia, Sequoia Series 400: (C - Sequoia Systems) Applications: General. Control: MIMD. Number of PEs: 1--64; :S32 (400). Type of PE: MC68010, 16-bit, and later MC68020, 50ns CP + 128KB Copy-back cache; 400: 2xMC68040 (for fault tolerance), 40ns CP, 8KB 1st level cache, 1MB 2nd level cache, set associative. Interconnection Network: Bus (Multibus). Memory Organization: Global: :S252MB (1-128 MMs & I/O elements, 2MB/MM, at least two I/O elements); :S4GB (400). Period: 1985 - available commercially; 1991 - Series 400. References: [259] [274}

SGI 4D-MP: (C - Silicon Graphics Computer Systems) Applications: Graphics. Control: MIMD. Number of PEs: 1-16 Computing Engines + 5 Geometry Engines, 1 Polygon Processor, 7 Edge Processors, 5 Span Processors, 20 Image Engines. Type of PE: MIPS R2000 CPU + MIPS R2010 FPU, 62.5ns CP in Computing Engine; the rest - custom. Interconnection Network: Hierarchical buses: Sync bus for synchronization, MP bus for data transfer (64-bit data + 32-bit address, 64MB/s) + data bus and address bus per PE, at 8B/cycle. Memory Organization: Global (including 64 Test & Set variables for synchronization) + 2 data and 1 instruction caches per PE (64KB each). Performance: Peak: (12MIPS + 8MFLOPS)/PE; sustained: 10MIPS; 1.6MFLOPS/PE on LIOO (64-bit). Period: 1988 - available commercially. References: [34) 240 Appendix. Information About the Systems

Siberia: (A - USSR Academy of Science, Siberian Division) Applications: General. Control: MIMD /SIMD /MSIMD. Number of PEs: 4 main subsystems: Central- 3 PEs; Vector Pipeline MIMD (VPM): 8 + 4 optional; Vector Parallel SIMD (VPS): 4-8 arrays of 8-64 PEs; Associative SIMD (AS): 4 Associative Memory units + lK PEs. Type of PE: Central: ES-1066 mainframe (Soviet, IBM-like); VPM: ES-2706 Array Processors (Bulgarian, AP-190L-like) + ES-2709 (optional); VPS: PS SIMD Array Processors (Soviet) - 8 PS2100 or 4 PS2000, each with 8-64 PEs; AS: I-bit PEs. Interconnection Network: Direct links among Central subsystem PEs; radial links to VPM PEs + shared memory links via specialized network; radial links to VPS PEs; radial connection from AS controller to AMUs. Memory Organization: Shared Multiported RAM among Central PEs; shared among VPM PEs through specialized network; shared among PEs within each VPS + bulk storage. Performance: Central: 36 MIPS; VPS: 200 MIPS; VPM: 72 MFLOPS/ES-2709. Period: Built by 1990, except ES-2709. References: [289}

SiDBM: (CR - AT&T Bell Labs.) Applications: Database. Control: MIMD. Number of PEs: ~~30: 1 Query Manager, 1 or more Query Processors, 1 or more Host Interface Processors, 1 or more Relation Managers. Type of PE: AT&T 32100. Interconnection Network: Bus (VMEbus). Memory Organization: Global, ~64MB in 4 MMs + Distributed, ~IMB/PE, DPR. Host: A UNIX Machine. Period: 1987/88 - 2 prototypes built, the first having 5 MC68000 processors, 2MB global memory + 128KB /PE + Sun host. References: [253} Appendix. Information About the Systems 241

Sigma-I: (NL - Electrotechnical Laboratory (Japan)) Applications: General. Control: Dataflow. Number of PEs: 256 (32 groups of 8). Type of PE: Semi-Custom, pipelined lOOns CPo Interconnection Network: Two-level hierarchical network: Local - 8-PE, lOxlO crossbar, with 600MB/s BW; Global- 32 groups, connected by a 2-stage n network, 2GB/s BW. Memory Organization: Local: 96K packets (88-bit). Performance: Peak: 3MFLOPS/PE, tested: 170MFLOPS, 640MIPS/128 PEs. Host: Yes. Period: 1984 - PE prototype; 1985 - PE redesign; 1/1988- 128-PE prototype. References: [180} [301] [476]

SIGMA-9: (C - Xerox Data Systems) Applications: General. Control: MIMD. Number of PEs: 1-4 + 1-11 I/O Processors (:$;12 in all). Type of PE: Custom, 32-bit. Interconnection Network: Multiple buses, 1 bus/PE. Memory Organization: Global, 4-16 MMs x 32KW (32-bit + parity), 900ns cycle + local, 1KW /PE. Period: 9/1971 - available commercially; previous SIGMA versions: 7 - 1966; 5 - 1967; 6 - 1970. References: [112] {473}

SKYbolt-mp: (C - SKY Computers Inc.) Applications: Graphics, General. Control: MIMD. N umber of PEs: :$;4. Type of PE: i860. Interconnection Network: VMEbus. Memory Organization: Local, 16MB /PE. Performance: $320MFLOPS. Host: Sun-3, 4. Period: 1991 - available commercially. References: [340] 242 Appendix. Information About the Systems

(SM)2-II: (A - Keio U. (Japan» Applications: General (Sparse Matrices Solving). Control: MIMD. Number of PEs: :550. Type of PE: IMPULSE architecture: Task Engine (MC68000), Inter-Process Communication Engine (IPC), Floating Point Processing Engine (MC68881). Interconnection Network: Bus (through Receiver Selectable Multicast Units) + Linear. Memory Organization: Distributed,2MB/PE. Period: 1986 - 20-PE prototype. References: [10} (301) [346} [474}

SMS 201: (C - Siemens (Germany» Applications: Weather Forecasting. Control: MIMD. Number of PEs: 128. Type of PE: 8-bit microprocessor. Interconnection Network: Bus. Memory Organization: Local, 13KB/PE + 5KB in communication memory. Performance: Peak: 38.4MIPS. Host: Aminicomputer. Period: First delivered in 1977. References: [230} [233}

SNAP-1 (Semantic Network Array Processor): (A - U. of Southern California) Applications: Knowledge processing, Natural language understanding. Control: SIMD or MIMD. Number of PEs: 160: 8 boards x 4 clusters x 5 PEs (1 Processing Unit, 3 Marker Control Units, 1 Communication Unit). Type of PE: TI TMS320C30 DSP chip, 32-bit. Interconnection Network: Via Shared memory: 2xfour-port RAM for intracluster communication (1 for 3 MUs + PU, 1 for 3 MUs + CU) + Spanning bus, hypercube emulated by a multiported RAM. Memory Organization: Partly shared (marker-passing memory in cluster) + Local (message passing between clusters). Host: Sun-3/280 + SNAP-l controller (via VME bus): containing Program Control and Sequence Control processors. Period: 1991 - prototype built. References: [100} [101} Appendix. Information About tbe Systems 243

SPARCserver 10jSPARCserver 600MP jSPARCstation 10 families: (C - Sun Microsystem Computer Corp.) Applications: General. Control: MIMD. Number of PEs: 1-4. Type of PE: TI superSPARC, 22.2ns CP on multi-PE versions + 1MB cache. Interconnection Network: Mbus. Memory Organization: :::;Shared, :::;512MB. Performance: Peak: 250 tps (600MP /54), ~400MIPS for all 4-PE versions). Period: 9/1991- 600MP family introduced; 5/1992 - SPARCserver 10, SPARCstation 10 announced. References: {425]

SPDS (SIMD Processor Development System): (CR - Amber Engineering) Applications: Signal Processing, Image Processing. Control: SIMD. Number of PEs: 2304-10,368 (1, 2 or 4 GAPP array boards, each with 32 or 40 chip x 72 PEs). Type of PE: GAPP PE. Interconnection Network: 2-D array. Memory Organization: Local. Performance: ~370KIPS/PE (8-bit +); 28.2KIPS/PE (8-bit x); 6.8KIPS/PE (16-bit x). Host: PC-AT compatible + SIMD controller + frame grabber. Period: Prototype built by 1989. References: {211}

Splash: (A - SRC) Applications: Pattern Matching. Control: Systolic-array-like SIMD. Number of PEs: 256--768: 32 chips x 8-24 cells/chip + 2 control chips. Type of PE: Xilinx 3090 FPGA, each 320 (26x20) Configurable Logic Blocks + 144 I/O Blocks. Interconnection Network: Linear array. Memory Organization: Distributed; 32 memory chips, one between each two FPGA chips (in addition to direct links), each 128KB, 50ns cycle + 8MB VME subsystem Bus staging memory for I/O. Host: Sun-3 or Sun-4, via VME bus. Period: 1989 - first prototype, end of 1990 - 16 machines. References: {151} 244 Appendix. Information About the Systems

SPUR: (A - UC Berkeley) Applications: Symbolic Processing. Control: MIMD. Number of PEs: 6-12. Type of PE: Custom, 32-bit rusc processor + 128KB cache. Interconnection Network: Bus (modified TI NuBus), 37.5MB/s. Memory Organization: Global, shared variables. Period: 1986/87 -prototype. References: [218}

SSM (Shuffle-Shift machine): (CR - Philips Forschungslaboratorium (Germany)) Applications: Image Processing. Control: MIMD. Number of PEs: 16. Type of PE: INMOS T800, 50ns CPo Interconnection Network: Shuffle & Shift Networks, 1.8MB/s per link peak BW, message passing. Memory Organization: Local, 1MB DRAM/PE, lOOns cycle. Host: PC, i80386, 50ns CP + display system. Period: Built by 1990. References: [394}

STARAN-B,E: (C - Goodyear) Applications: Signal Processing. Control: SIMD. N umber of PEs: 32 Array Modules, 256 PEs each. Type of PE: Custom, bit-serial. Interconnection Network: Batcher sorting network or Flip network. Memory Organization: Local, 256 x 256-bit (STARAN-B), 9216 x 256-bit (STARAN-E) per array module,150ns read, 250ns write (80MB/s access rate on B, 215MB/s on E). Performance: 11.5MOPS (16-bit add) to 48MOPS (16-bit search) on B; 15.4MOPS to 60.6MOPS on E. Host: HIS-645 or Sigma-5. Period: 1972 - STARAN-B delivered; 1975 - STARAN-E. References: [112] [123] [284] [341} [342} Appendix. Information About the Systems 245

SuperSet Plus family: (C - Computer System Architects) Applications: General. Control: MIMD. Number of PEs: 16-256, in 16 clusters. Type of PE: Transputers: T425, 62.5ns CP; T800/5, 25ns CPo Interconnection Network: Reconfigurable. Memory Organization: Local, 256KB-32MB/PE. Performance: 160MIPS/16 PEs (T425), 3840MIPS/256 PEs (T800). Host: SUN, Mac or PC. Period: 1991 - available commercially. References: [84] [86]

Suprenum-l: (C - Suprenum (Germany)) Applications: General. Control: Multiple Vector Processors. Number of PEs: 16 clusters x (16 PEs + 1-2 communication nodes, disk controller node, diagnosis node). Type of PE: MC68020 (32/64-bit) + MC68882 with vector unit - Weitek WTL2264/65, 50ns CP + MC68851 PMMU + 64KB vector cache + communication unit. Interconnection Network: Two-stage, message-passing bus system: intracluster bus (2x64-bit parallel buses, 160MB/s eac h)+ intercluster 2-D buses: horizontal & vertical bit-serial Suprenum buses, based on UPPER ringbus, 280Mbit/s. Memory Organization: Local: 2-8MB /PE. Performance: Each PE: 2MIPS (scalar) + 1O-20MFLOPS (The higher figure is achieved when using chaining). Host: Threesome: programming computer, OS computer & Maintenance computer. Period: 1989 - prototype. References: [149] [320} [398} [446} 246 Appendix. Information About the Systems

SX-3 or SX-Xab (a = number of APs, b = number of sets of pipelines/AP): (C - NEC) Applications: General. Control: Multiple Vector Processors. Number of PEs: ::;4. Type of PEl Custom, 2.8 CP, containing ::;4 vector pipe sets x 4 functional pipes each (2 add/shift, 2 multiply/Boolean) + 64KB cache. Interconnection Network: Via communicating Registers (3KB). Memory Organization: Global, ::;64GB (8GW x 64-bit), 20ns cycle + Local, 144KB vector registers + 1-16 dB (::;2GW) Extended Memory. Performance: 5.5GFLOPS/PE peak, 220MFLOPS/4 pipes on L100, 13.4GFLOPS/4 PEs on L1000, 20GFLOPS/4 PEs on Linpack; 39MFLOPS/PE on LFK. Period: 1990 - available commercially. References: [42] [119] [373] [460}

SYDAMA: (A - Swiss Federal Institute of Technology (ETH), Electronics Lab.) Applications: Image Processing. Control: Dataflow. Number of PEs: ::;256. Type of PEl Custom, of various types and functions, based on LUTs. Interconnection Network: 12 Circulating buses, 8-bit wide, 7.5MB/s each. Memory Organization: Local + 256x256-pixel Image Memory. Performance: 50 frames/s, 300x400 pixels each. Host: IBM-PC/AT or Macintosh II. Period: 1989 - prototype. References: [158}

SY.MP.A.T.I. 1 & 2: (A + NL - Paul Sabatier U., Toulouse + CEN (France)) Applications: Image Processing. Control: MSIMD. Number of PEs: 16. Type of PEl Custom, 8-bit (S.-I), 16-bit (S.-2). Interconnection Network: Shifting Loop (Linear). Memory Organization: Local, storing part of the image. Period: Late 1980s - SY.MP.A.T.I. l. References: [33) [114} Appendix. Information About the Systems 247

Synapse N+l: (C - Synapse) Applications: Database/ Transaction Processing. Control: M1MD. Number of PEs: 28. Type of PE: MC68000 + 16KB cache. Interconnection Network: Buses - 2 "Synapse Expansion Buses": data (32-bit) & address (24-bit). Memory Organization: Global, ::;16MB. Performance: 1.5TPS/PE. Period: 1985 - available commercially; defunct by 1987. References: [305} [381) [443)

Taiwan National University Hypercube (adopted name): (A - Taiwan National U.) Applications: Lattice Gauge Problems. Control: Multiple Vector Processors. Number of PEs: ::;256: 32 groups x 8 PEs. Type of PE: MC68020 + MC68882, 62.4ns CP + Weitek FP chips. Interconnection Network: Hierarchy of buses: VMEbus within group, VMSbus among groups. Memory Organization: Shared within group + Local: 1MB on-board SRAM + 4MB DPR/PE + 32KW x 32-bit extra-fast on-board SRAM. Host: Yes. Period: 6/1987 - 4-PE prototype running. References: [76}

Tandem NonStop Series (TXP, CLX, VLX): (C - Tandem) Applications: Database. Control: M1MD. Number of PEs: 2-8 (CLX), 2-16 (I, II, VLX), 2-32 (TXP). Type of PE: Custom, 16-bit, 83.3ns CP (TXP). Interconnection Network: 2 "Dynabus" buses {"X," "Y"),32-bit, message passing (packets: ::;65KW), circuit switched. Memory Organization: Local, ::;2MB, accessible at 4MB/s. Performance: 3M1PS/PE, ~208TPS (32 PEs). Period: 1976 - NonStop I, 1981 - NonStop II, 1983 - NonStop TXP. References: [8) [112) [219} [382) [443] 248 Appendix. Information About the Systems

TDM: (A - Harbin Institute of Technology (PRC)) Applications: Database. Control: Multiple SIMD. Number of PEs: 1-2 Sub-Tree Elements (STEs), each with 1 Backend Controller/ Processor (BCP) + ::;8 Slave Processors (SPs). Type of PE: BCP - Custom; SP - i8088. Interconnection Network: Tree (not binary). Memory Organization: Distributed within STE, with shared variables + 1 disk/SP. Host: Yes, possibly multiple. Period: April 1987 - I-BCP, 3-Ss prototype. References: [285}

Teradata Database Computer DBC/I012: (C - Teradata) Applications: Database (Relational). Control: MIMD. Number of PEs: ::;1024, only at the leaves, 8PEs/cabinet. Type of PE: i8086 on model 1; i80286/80287, 62.5ns CP on model 2; i80386/80387, 50ns CP on model 3. Interconnection Network: Tree, message passing ("Y-Net"). Memory Organization: Local. Host: Yes. Period: Available since mid-1980's; 7/1988 - model 3. References: [8) [383)

TIP-3: (CR - NEC C&C Systems Research Laboratory) Applications: Image Processing. Control: Dataflow. Number of PEs: 8. Type of PE: Image Pipelined Processor (ImPP, NEC JLPD7281) chip, 200ns per pipeline stage. Interconnection Network: Ring Bus, connected via the MAGIC (Memory Access General bus-Interface Chip) to the Controller's bus (Internal Bus) and Memory Bus, using 16-bit data tokens. Memory Organization: Global, 2 MMs x IMW (I8-bit) Image Memory, 600ns cycle + Local, 512KW /PE. Host: PC9800 host + MC68000-based controller (Process Support Unit -PSU). Period: 1984 - ImPP chip; 1985 - prototype. References: [204} {432} Appendix. Information About the Systems 249

Titan II (Stardent 1500), Titan III (Stardent 3000): (C - Ardent (later Stardent)) Applications: Graphics. Control: MIMD. N umber of PEs: 1---4. Type of PE: Titan II: MIPS R-2000, 32-bit RISC chip, 62.5ns CP + 64-bit Vector unit with Weitek WTL2264/5 chip set (125ns CP for the VU) + 32KB cache; Titan III: MIPS R3000+R3010 FPA, 31.25ns CP + 64KB I-cache, 64KB D-cache + BIT chip-based vector processor. Interconnection Network: Two split (data/address) buses: R - read only, S - read/write; each transfers 128-bit/cyde (62.5ns), or 256MB/s. Memory Organization: Global, 8-16 interleave, 4MMs with 32MB (using 1Mbit chips) or 128MB (4Mbit chips) each, BW: 256MB/s. Performance: Titan II - Peak: 64MIPS or 64MFLOPS; tested: lOMIPS/PE, 6MFLOPS/PE on LlOO, 12MFLOPS/PE on LlOOO, l.7MFLOPS/PE on Livermore Loops; Titan III - Peak: 198MIPS or 128MFLOPS/PE, tested: 8MFLOPS/PE on LlOO, 78MFLOPS/PE on L1000, 4.7MFLOPS/PE on Livermore Loops. Period: 1988 - available commercially; 1989 - Titan III. References: [106} [264} [384] [385]

TN (Toroidal Net): (NL - IRECE, Italy) Applications: Signal Processing. Control: MIMD. Number of PEs: ::;128. Type of PE: INMOS T800, 33ns CP; eventually - Vectram Transputers. Interconnection Network: Torus. Memory Organization: Local. Performance: Peak: 15MIPS,2.25MFLOPS/PE. Host: Yes. Period: 8-PE model implemented by 1989. References: [136}

TOMP: (A - Torino Polytechnic) Applications: General. Control: MIMD. N umber of PEs: ::;4. Type of PE: Optional: Zilog Z8001 or MC68000 or NS32000. Interconnection Network: Bus. Memory Organization: Distributed (DPR) + local. Period: 1982 - prototype. References: [99} 250 Appendix. Information About the Systems

TOP-1: (CR - IBM Research (Japan)) Applications: Graphics. Control: MIMD. Number of PEs: :511 (one for hard disk management). Type of PE: i80386 + Weitek 1167 + 128KB Snoopy cache. Interconnection Network: Dual buses, 64-bit wide, 2-way interleaved, 16ns cycle, 85MB/s effective data transfer rate, used for memory access & message passing. Memory Organization: Shared, :5128MB (:58MMs x 16MB),5.3ns cycle. Host: IBM PS/2-80, through the microchannel. Period: 1989 - prototype. References: [426J

Topology 100: (C - Topologix) Applications: General. Control: MIMD. Number of PEs: 4-128 (4/board). Type of PE: T800 Transputer, 50ns CP (eventually, CPs in the 28.6-57.1ns range) + 4KW, 50ns access cache. Interconnection Network: Reconfigurable, based on Inmos C004 Link Crosspoint; 2 networks: SYSNET for control, network switching, boards & host interface, controlled by one T212 per board; APNET for applications' communication; message passing, 20Mbit/s per link. Memory Organization: Local, 1-16MB/PE, 200ns cycle, lOOns read access, 20MB/s BW. Performance: Peak: (20 RISC MIPS + 1.6MFLOPS)/PE; 4MWhetstone/PE (all at 50ns cycle). Host: Sun 3/Series, connected via VMEbus to PEs' DMA. Period: November 1988 - announced. References: [440J

TOPSTAR: (A - U. of Tokyo) Applications: Pattern Recognition (General). Control: Dataflow. Number of PEs: 16 Processing Modules (PMs) + 8 Communication & Control Modules (CMs). Type of PE: Zilog Z-80 (PMs and CMs). Interconnection Network: Each PM connected to :54 CMs; each CM connected to :58 PMs. Memory Organization: Distributed. Period: 1982 - TOPSTAR I, 3 PMs & 2 CMs, prototype of TOPSTAR II. References: [173J Appendix. Information About the Systems 251

TOPSY: (A - City V., London) Applications: General. Control: MIMD. N umber of PEs: ~256. Type of PE: MC68030, 62.5ns CPo Interconnection Network: Torus, circuit-switched message passing, peak BW: 12MB/s per channel. Memory Organization: Local,8MB/PE. Period: 16-PE prototype constructed by 1992. References: (416]

Torus: (CR - Philips Research Lab. (Netherlands)) Applications: General. Control: MIMD. Number of PEs: 36. Type of PE: Custom, "simple." Interconnection Network: 4-nearest-neighbor torus, bit-serial links. Memory Organization: Local, 256B/PE. Period: 1979 - operational. References: (276]

TRAC Series: (A - V. of Texas at Austin) Applications: General. Control: Partitionable SIMD/ MIMD. Number of PEs: 4 (TRAC 1.0) - 8 (TRAC 1.5) (2n). Type of PE: TRAC 1.0: 2xAMD2901, 8-bit; Inter2910 bit-slice, IJ.Ls CP on later models. Interconnection Network: SW-banyan network, 8-bit-wide data paths, circuit switched or packet switched, reconfigurable during execution. Memory Organization: Global,9 (TRAC 1.0) - 27 (TRAC 1.5, 2.0) MMs (3 n ), 4-64KB each (1MB in TRAC 2.0). Period: 1982 - 4-PE model (TRAC 1.0). References: [7) [8) [182) [183) 252 Appendix. Information About the Systems

TRACE/500 Family: (C - Multiflow Computer Inc.) Applications: General. Control: Multiple VLIW. N umber of PEs: S;4. Type of PE: Custom; containing: 2 Integer Clusters (2 ALUs each, 32-bit + 64W D-cache), 1 FP Section (2 FALUs, 64-bit + 128W D-cache), Sequencing and Control Unit (+ 3 I-caches); 15ns CPo Interconnection Network: Multiple buses, 32-bit - 2 memory load buses, 2 store buses, 1 address bus, 2 buses among PEs; router within PE. Memory Organization: Shared, on 32-1024GB in 2 Memory Controllers x 1-4 Quadrants x 4 Cards x 8 banks of 1MB, 40ns or 60ns cycle; BW: S;~IGB/s. Performance: 14/500: 78MFLOPS/PE on L100, 120MFLOPS/PE on L1000, 29MFLOPS on LFK 24; 28/500: 113MFLOPS/PE on L100, 216MFLOPS/PE on L1000, 32MFLOPS/PE on LFK 24. Period: 1990 - available commercially. References: [S3}

Transition Machine: (C - Boeing) Applications: General. Control: MIMD. Number of PEs: 8-100. Type of PE: 16-bit microprocessor. Interconnection Network: 3 buses: control, memory and I/O. Memory Organization: Global, multiple MMs. Period: 1984 - prototype. References: [282]

Trusted Multiple Microcomputer: (C - Gemini Computers) Applications: General. Control: MIMD. Number of PEs: 1-8. Type of PE: 80286. Interconnection Network: Bus. Memory Organization: Global, S;128MB. Period: 1985 - available commercially. References: [259] Appendix. Information About tbe Systems 253

TRW Mk III: (C - TRW) Applications: Neural Systems Simulation. Control: MIMD. Number of PEs: :=;15. Type of PE: MC68020/68881. Interconnection Network: Bus (VMEbus), 1.13M interconnects possible. Memory Organization: Local. Performance: 500K interconnections/so Host: VAX/VMS. Period: 1987 - available commercially. References: [241} [437]

Tumult-x: (A + CR + NL - Twente U. + Oce + Dr. Neher Labs. (Netherlands )) Applications: General. Control: MIMD. Number of PEs: x=15; later x=64. Type of PE: MC68000, 125ns CP or MC68020, 62.5ns CPo Interconnection Network: Ring, unidirectional on Tumult-15, bidirectional on Tumult-64, message passing, with message switching element at each PE, lOOns cycle. Memory Organization: Local. Period: Tumult-15 built by 1988, Tumult-64 under construction in 1988. References: [404}

Ultracomputer: (A - New York U.) Applications: General. Control: MIMD. N umber of PEs: 8-4096. Type of PE: MC6801O, lOOns CP + Weitek WTL1164/65 FPU + cache on Ultra II; AMD Am29000 planned for Ultra III. Interconnection Network: Combining n network, packet switching (Bus on the prototype). Memory Organization: Global (4 MMs of 2MB each on prototype). Performance: lOMIPS/PE (Transputers). Period: 1983 - 8-PE, 4-MM prototype. Refer~nces: [7] [8] [110} [144] [157} [182] [183] [261] 254 Appendix. Information About the Systems

UNIVAC 1100 family: (C - Sperry Rand) Applications: General. Control: MIMD. Number of PEs: 2 (1108), 4 (1110, 1100/80, 1100/90). Type of PE: Custom, 36-bit, 375ns CP (1108) 150ns CP (1110), lOOns CP (1100/80), 30ns CP (1100/90). Interconnection Network: Via multiported memory + fully interconnected interrupt lines (until 1100/80), Bus (1100/90). Memory Organization: Global, 4MMs (64KB/MM on 1108 & 1110, 0.5-1MB on 1100/80, 16MB on 1100/90). Performance: 27MIPS for 1100/90. Period: 1968 - 1108, 1972 - 1110, 1977 - 1100/80, 1982 - 1100/90. References: [8] [32] [112] [194] [356, chapter 5}

VAX 6240: (C - DEC) Applications: General. Control: MIMD. N umber of PEs: 4. Type of PE: Custom + hierarchical cache: primary, on chip, 1 KB, instruction only; secondary, off-chip, 256KB, data + instruction. Interconnection Network: Bus. Memory Organization: Shared, 128MB. Performance: 2.8 VAX MIPS. Period: 1990 - available commercially. References: [90} [212}

VAX 9000: (C - DEC) Applications: General. Control: Multiple Vector Processors. Number of PEs: ~4 + Service Processor. Type of PE: Custom, with optional vector processor, RISC-based implementation of CISC instruction set + 128KB cache. Interconnection Network: Crossbar between PEs, Service Processor, memory and I/O Controllers. Memory Organization: Shared, ~256MB/MM, 2 memory controllers, available at 500MB/s (read) and 250MB/s (write). Performance: Peak: 40 VUPS/PE; tested: 157VUPS/4 PEs. Period: 10/1989 - available commercially; 2:75 sold by 9/1990. References: [130} Appendix. Information About the Systems 255

Victor: (CR - IBM Yorktown Heights) Applications: General. Control: MIMD. Number of PEs: ::;256 (16x16). Type of PE: Inmos T800 Transputer. Interconnection Network: 4-nearest-neighbor grid, message passing. Memory Organization: Local, 4MB /PE. Host: IBM-PC/RT or IBM-PC/AT. Period: 1988 - 64-PE prototype; 1989 - 256-PE prototype. References: [408J

VOLVOX TS-800, IS-860: (C - ARCHIPEL (France)) Applications: General Control: MIMD. N umber of PEs: 64 for the TS-800, 48 for the IS-860. Type of PE: Transputer T805 for the TS-800, Intel i860 for the IS-860 + T805 communication processor. Interconnection Network: Ring of T222 Transputers forms the Service Processor Network that handles the message passing routing and virtual interconnections. Memory Organization: Local, ::;16MB/PE. Performance: 96 MFLOPS for 64-PE TS-800; 2952 MFLOPS for 48-PE IS-860. Period: 1990 - Available Commercially. References: [19]

VP-2XOO/yO (y = no. of scalar units): (C - Fujitsu) Applications: General. Control: Multiple Vector Processors. Number of PEs: ::;4 scalar processors + 2 vector processors. Type of PE: Custom, 4 or 3.2ns CP (on vector processor), 1-2 vector pipelines/vector processor. Interconnection Network: Via shared memory. Memory Organization: Global + System Storage Unit (::;32GB). Performance: ::;5GFLOPS/8 vector pipes, @ 3.2ns cycle. Period: 8/1990 - announced; 1991 - available commercially. References: [374J 256 Appendix. Information About the Systems

VPP: (CR - Toshiba) Applications: Image Processing (General). Control: Multiple Vector Processors. N umber of PEs: ::::;64. Type of PE: Custom, vector ALU. Interconnection Network: "Source-Destination Loop": 4-nearest-neighbor torus with communication nodes at intersections (one per device - PEs, host, I/O processor); message passing, with data transfer completed in two PE cycles - one for column (source ring) transfer, the next for row (destination ring) transfer. Memory Organization: Local, Program and Data memories. Host: Front End Processor. Period: 1986 - 2-PE prototype; 1987/88 - 8 PEs; 1988 - S-D loop network. References: [201J

Vulcan: (CR - IBM Yorktown Heights) Applications: General. Control: MIMD. N umber of PEs: ::::;32,000. Type of PE: i860. Interconnection Network: Multistaged, custom 4 x 4 bidirectional switch. Memory Organization: Local, 32MB /PE. Host: RS/6000. Performance: Peak 50 MB/S bidirectional per processor; the switch has peak bandwidth of 50 MB /S and minimal latency of 100 ns. per stage. Period: 1992 - 20-PE prototype. References: [409J Appendix. Information About the Systems 257

WARP & iWARP : (A+C - CMU + GE (iWARP - Intel)) Applications: Image Processing. Control: MIMD / Systolic. Number of PEs: ~1O "Cells"on WARPj :::;1024 (32x32) on iWARP. Type of PE: Custom: FP x & +, communication queues (X, Y, Adr), data crossbar, address generating unit, 50ns CP on iWARP. Interconnection Network: Linear (to/from neighboring cells), two channels (X,Y): 80MB/s data, 40MB/s addressesj 8-nearest-neighbor on iWARP, BW: 320MB/s per channel. Memory Organization: Local, 32Kx32-bit + 2Kx32-bit (WARP)j BW: 160MB/s (iWARP). Performance: WARP - Peak: lOMFLOPS/PEj measured: 79MFLOPS/1O PEs on L100j iWARP - Peak: 20MFLOPS, 20MIPS/PE. Host: UNIX machine (Sun-3) + External Host (I/O). Period: 1986 - WARP prototypej 5/1990 - 12-PE iWARP, 8/1990 - 3 x64-PE systems. References: [8] {14} {63} [194] {243} {322} [459]

Wavefront Array Processor (WAP): (A - USC) Applications: Signal Processing. Control: Dataflow/Systolic. Number of PEs: 64 (8 x 8). Type of PE: NEC JLPD7281, 200ns CPo Interconnection Network: Grid. Memory Organization: Distributed, partially shared (row/column memory unit). Period: Late 1980s - prototype. References: {453}

WRM: (CR-IBM) Applications: Wire Routing. Control: MIMD (SPMD). Number of PEs: 64 (8 x 8). Type of PE: Z80A - Master/Slave, 8-bit, 250ns CP. Interconnection Network: Torus, message passing + communication controller. Memory Organization: Local, 15KB. Host: Control Processor. Period: 1980 - operational. References: [7] [8] 258 Appendix. Information About the Systems

Xl (CNAPS): (C - Adaptive Solutions Inc.) Applications: Neural Network. Control: SIMD. Number of PEs: 64 PEs/chip. Type of PE: Custom, 16-bit logic, 9x 16-bit multiplier, 32-bit add, 40ns CPo Interconnection Network: Linear. Memory Organization: Local 4KB weights storage. Performance: 1.6G Connections computed/s per chip (8, 16-bit weights), 260M Connection Updates/s per chip. Period: Implemented by 1990. References: [164}

XTM: (C - Cogent) Applications: General. Control: MIMD. Number of PEs: 2-400. Type of PE: Transputer T800-20. Interconnection Network: Crossbar, dynamic, T800 Controlled. Memory Organization: Local, 4MB /PE. Performance: 45MFLOPS + 150MIPS measured for 32 PEs. Host: Server Processor (one of the Transputers). Period: 1988/89 - available commercially. References: [117] {455}

YSE (EVE): (C - IBM) Applications: Logic Simulation. Control: Multiple VLIW. N umber of PEs: 256. Type of PE: Custom, 80ns CPo Interconnection Network: Crossbar Connection. Memory Organization: Local, data memory and instruction memory. Performance: >3M gate computations per second. Host: Yes + Control Processor. Period: 1982 - 16-PE prototype; by 1988 - 8 full-scale machines. References: [7] [8] Appendix. Information About the Systems 259

Yuppie: (CR - IBM T. J. Watson Research Center, Yorktown Heights) Applications: Image Processing. Control: SIMD. Number of PEs: 2::16. Type of PE: Custom, bit-serial, 16/chip. Interconnection Network: Polymorphic torus: dynamically reconfigurable, based on a 4-nearest-neighbor torus. Memory Organization: Local, 4Kbit/PE. Period: Chip built by 1988. References: [272J

ZMOB: (A - U. of Maryland) Applications: Production Systems & Logic Programming. Control: MIMD. Number of PEs: 256. Type of PE: Z80 + 32-bit FP coprocessor. Interconnection Network: Ring "Conveyor Belt" , 48-bit wide (8 - control, 16 - data, 12 - source, 12 - destination), lOOns cycle. Memory Organization: Local, 64KB, 300ns cycle. Host: VAX-ll/780. Period: 1982 - prototype. References: [182] [183] [444] [449] [464J Index

(SM)2-II, 27, 242 Applications, 10, 31 3081KX, 196 Applied Intelligent Systems Inc., 148, 228 3084QX, 196 APS Family, 24, 152 3090 Series, 196 APx, 26,153 6180 Multics, 194 ARCHIPEL, 255 Aachen Technical University, 206 Architectures, 17 AAP, 28, 42 Ardent, 249 AAP-1,2, 146 Ardent's Titan, 35 AAPT, 28, 146 ARES, 28, 33, 37, 50, 153 ACE, 25, 63, 146 Argonne National Laboratories, 68, 204 ACP, 27, 147 Armstrong, 24 Actors, 8 Armstrong I, 153 Adaptive Antenna Processor Testbed, 146 Artificial intelligence, 32 Adaptive Solutions Inc., 258 AS9080, 218 Advanced Computer Program, 121 ASAP, 28, 53, 154 Advanced Numerical Research and Analysis ASP, 23, 50,53, 154 Group, 222 Aspex Microsystems, 154, 227 Advanced Processor Technology Laboratory, Athas, 17 183 ATOMS, 26, 58, 155 AGM, 25,147 AT&T Bell Laboratories, 155, 176,240 AHR, 26, 147 AIS-5000, 28, 148 B-HIVE, 23, 156 ALAP, 23, 104, 148 B-SYS, 30, 158 Alewife, 119 B6700, 159 ALICE, 30, 43, 54, 148 B7700, 49, 159 Alliant, 47, 149 Balance 8000 & 21000, 238 Alliant FX, 26 Bandwidth, 13 Alliant FX/8, 52, 92 BBN Advanced Computers, 155, 156, 228 Alliant FX/Series, 12, 100, 110 BBN Butterfly, 10, 25, 90, 101 Amber Engineering, 243 Beijing Polytechnic University, 215 Amdahl,149 BiiN, 26, 53, 156 Amdahl 5995, 27,149 BLITZEN, 28,157 Ametek,150 Blocking, 14 Ametek 2010, 26, 43, 82, 150 Boeing, 252 Ametek System/14, 24, 103, 112, 150 British Robotics Systems, 203 AMP-I, 24, 150 Brown University, 153, 158 AMT,l71 Browne, 8 ANMA, 23, 57, 81, 151 Brunei University, 154 Annual International Conference on BSP, 26,157 Computer Architecture, 18 BTl 8000, 24, 158 ANURAG,222 BTl Computer Systems, 158 Ap2S, 28, 152 Bulgarian Academy of Science, 152 AP1000, 23, 24, 151 Bull,38 Apollo 10000, 25, 101, 152 Bull DDC, 30, 42

261 262 Index

Bull SA, 158 Communicating processors, 8 Bureau of Standards, 227 Complex control pattern, 117 Burroughs, 1 Computer Division of the Siberian Burroughs B6700, 49 Department of the USSR Academy of Burroughs B77oo, 26, 73, 102, 109 Science, 210 Burroughs Corp., 157, 159, 170, 197, 226 Computer System Architects, 245 Burroughs D825, 101 Computing Surface, 24, 77, 102, 103, 163 Bus Congestion, 9 Concert, 24, 163 Butterfly, 155 Concurrent Computer 32xO, 25, 164 BVM, 23,159 Concurrent Computers, 101, 164 Concurrent processing, 5 C-DOT,230 Connection Machine, 15, 23, 42, 48, 53, 55, C-LISP,30 57 C-LISP Machine, 162 Connection Machine-1,2,200, 164 C.mmp, 12, 25, 63, 162 Connection Machine-2, 82 C. Seitz, 111 Connection Machine-5, 165 C90, 166 Connectionist model, 43 California Institute of Technology, 112, 166 Control, 11 Caltech Concurrent Computation Program, Control Data Corp., 168 111 Control Driven, 8 Caltech Cosmic Cube, 32, 43 Convex, 110, 165 CAP, 29, 159 Convex 2xO, 26, 165 CAP I, II, 200 CORAL, 23, 165 Carnegie-Mellon University, 162, 192, 229, Correlative studies, 21 232,257 Cosmic Cube, 26, 166 CDC,71 Cost of software, 116 Cedar, 16, 23, 52, 66, 92, 119, 121, 160 Cost/performance ratio, 32 Cellular, 185 CPS, 121 Cellware Ltd, 207 Cray Research Corp., 75, 100, 166, 167 Celtia, 23 Cray X-MP, 88, 109 CEN,246 Cray X-MP, Y-MP, 26 Centralized control, 46 Cray Y-MP, 50 Centre for Development of Telematics, 230 Cray-1, 11, 17 CERT, 204, 210, 213 Cray-2, 167 CESAR, 28, 160 Cray-3,167 Cetia 1000, 160 CROSS 8, 29, 107, 167 Charles Stark Draper Research Laboratory, CSELT Laboratories, 222 187, 188 Culler, 167 CHoPP, 26, 52, 161 Culler PSC, 26, 110 CHoPP-I, 52 CYBA-M, 29, 168 Circuit switching, 13 Cyber 875, 990, 26, 27, 168 City University, 251 CyberPlus, 27, 71, 75, 168 Classification Schemes, 6 Cybers,75 CLIP 4, 28, 42, 106, 161 Cydra-5, 26, 53, 169 CLIP 7, 28, 42, 53, 106, 161 Cydrome Inc., 169 Cm*, 25, 63, 75, 162 Cyto I, 28, 169 CM-5, 16, 23, 80, 84, 165 Cyto Systems, 169 CM-1, 164 Cytocomputer, 28, 42 CM-2,164 Cytocomputer I, 169 CM-200,164 CNAPS, 258 D825, 1,25,32,170 Cogent, 258 DADO, 30, 36, 50, 59, 170 Columbia QCD Machine, 26, 32, 163 DAFIM, 30, 170 Columbia University, 36, 52, 161, 163, 170, DAMP, 23, 170 220 DAP, 23, 42, 55, 104, 171 Commercial parallel processors, 90 DASH, 25, 69, 84, 171 Index 263

Data Driven, 8 EDDY, 25,177 Data Exchange and Synchronization, 12 EGPA, 24, 68,80,87,178 Data flow, 6 EI'brus,25 Data Transport, 28 El'brus family (-1, -2, -3), 178 Data Transport Computer, 172 Electronic Corporation of India Ltd., 222 Data West, 49, 211 Electrotechnical Laboratory, 151, 179, 201, Data-driven, 11 241 Database management machines, 107 ELI, 57, 178 Dataflow Technology Nederland, 177 ELI-512, 23, 52, 104 Dataflow with I-Structures, 8 ELSAG,180 Dataflow with tokens, 8 Elxsi,179 Datarol, 174 Elxsi 6400, 23, 103, 179 DataWave, 29, 172 EM-3, 30, 179 DATIS-P, 27, 172 EM-4, 30, 179 David Sarnoff Research Center, 231 EMMA, 29, 119, 180 DBC/1012, 248 EMMA-2, 28, 37, 90, 180 DDC, 38,158 Empress, 23, 180 DDDP, 25, 173 Encore Corp., 181 DDP, 25, 50, 173 Encore Multimax, 24, 101, 110 DEC, 254 Environmental Research Institute of DEC Research, 185 Michigan, 169 Decentralized control, 46 ERIM,169 Delft Parallel Processor, 27, 173 Erlangen General-Purpose Architecture, 68 Delta, 223 ES-1, 27, 181 Demand Driven, 8 ES-2701, 27, 182 Demand-driven, 11 ES-2703, 27, 182 Denelcor Corp., 174 ES-2704, 182 Denelcor HEP-1, 26, 75, 88 ES-2704/2727, 30 DFM,30 ES-2727, 182 Digital Equipment Corp., 185, 254 ESL Systolic Array, 27, 28, 48, 183 Digital/Analog Design Associates, 227 ESPRIT,50 DIPOD, 29, 53, 174 ETA, 27, 75, 102, 109, 183 DIRECT, 30, 175 ETH, 180, 202, 206, 246· Directory-based cache coherency, 119 Evans & Sutherland Computer Corp., 181 DIRMU 25,63 EVE, 57, 258 DIRMU-25, 25, 81, 175 Execution streams, 7 Distributed memory, 90, 121 Exotic control mechanism, 47 Distributed processing, 5 Expert systems, 33 Distributed processors, 17 External attributes, 21 DMMP, 8, 69 DMSV, 8, 69 FAIM-1, 30, 50, 183 DOOM, 26, 50, 175 FASP, 28, 184 DPP81,DPP84,173 Fault tolerance, 17, 37 DPS-1, 23,87,90,176 FDS-RII, 29, 184 Dragon, 24, 176 Federal University of Rio de Janeiro, 215 DSP-3, 23, 176 FEM, 27, 32, 43, 184 DTM-1, 28, 177 FERMATOR, 23, 185 DTN, 29, 35, 177 FERMILAB, 121, 147 Duke University, 157 FFP, 30, 43, 120 Dusty deck FORTRAN, 47 Fifth Generation, 30 Dynamic interconnection networks, 69 Fifth Generation Computer, 36, 50, 53, 59, Dynamic networks, 14, 43, 121 84, 185 Fifth Generation Computing System, 33, 93 ECIL,222 Firefly, 24 Ecole Normale Superieure, 229 Firefly I,ll, 185 Economic considerations, 59 Flagship, 30, 186 264 Index

Flex/32, 90, 111, 186 HBA-I, II, 192 Flexible, 186 HCl6-186, 29, 107, 193 Flexible Flex/32, 25 HCL,208 FLIP, 33, 186 HCRC, 24, 193 Floating Point Systems Computing, 187,207 HDM, 29, 107, 193 Flosolver, 27, 187 HDVSP, 29, 194 Flynn, 6,11 Hebrew University, 208 Flynn's classification, 6 Hector, 23, 194 Fox, Geoffrey C., 17, 111 Heidelberg University, 75, 229 FPS,112 HEP-l, 109, 119, 174 FPS Computing, 207 Hewlett-Packard, 152 FPS T-series, 73, 187 Hierarchical bus system, 75 FPS-T, 26, 77 Hitachi, 198 FTMP, 24, 187 Hitachi S81O, 17 FTPP, 24, 188 Hitachi S81O/20, 11 Fuji Xerox, 209 Hoare, 63 Fujitsu, 188, 207, 255 Honeywell, 58, 192, 194 Fujitsu Kabu-Wake, 30, 33, 188 Honeywell 6000, 27, 194 Fujitsu Laboratories, 151, 159, 197, 209 Hughes Aircraft, 148, 191, 192 Fujitsu VP200, 11, 17 Hughes Research Laboratories, 195, 200 Fujitsu's CAP, 35 Hughes Systolic, 23 Functional programming, 33, 36 Hughes Systolic/Cellular Array Processor, Future trends, 115 195 FX/Series, 149 Hybrid Computer Research Centre, 193 Hybrid Parallel Machine, 215 GAM, 28, 50, 57, 106 Hypercube, 80, 111 GAM I, II, 188 Hypercube network, 79 Gamma, 30, 37, 189 HyperFlo, 27, 195 GAPP, 28, 189 Hyperstore Systems Inc., 147 GEC Hirst Research Centre, 222 Gemini, 103, 112, 223 I-structures, 54 Gemini Computers, 252 IAPX,50 General Electric, 190, 257 IAPX-432, 26, 195 General-purpose machines, 32 IBM, 32, 146, 190, 196, 197, 205, 213, 236, Geometric Arithmetic Parallel Processor, 189 250, 255, 256, 257, 258, 259 George Mason University, 188 IBM 308x, 25, 101 GF-l1, 26, 104, 190 IBM 3090, 27, 196 GF11,l1 IBM Power Visualization System, 196 GMMP,8 IBM PVS, 29 GMSV,8 IBM RS6000, 121 Goodyear, 215, 244 IBM SP1, 24, 120, 197 Gould, 220 IBM T. J. Watson Research Center, 15 GP1000, 155 IBM's 3090, 110 Granularity, 39 IBM's GF-11, 32 Graph Reduction, 8 ICL,l71 GRID, 28, 42, 190 ICOT, 33, 93, 216, 227 GRIP, 26, 59, 191 IDATEN, 29, 33, 197 Grosch's law, 91, 109 IEEE P896 Futurebus, 48 nSC,216 H4400, 25, 191 ILLIAC IV, 26, 104, 197 HAP, 23, 50, 59, 191 Image processing, 32 HAPPE, 27, 58, 84, 192 Imperial College, 148, 186 Harbin Institute of Technology, 248 Implicit parallelization, 47 Harpy, 29, 107, 192 Inference, 33 Harris Corp., 163 Institute of Cybernetics of the Ukrainian HBA, 28, 29 Academy of Science, 182 Index 265

Institute of Precision Mechanics and LDF-100, 32, 53, 59, 204 Computer Technology, 178 LEMUR, 25, 68, 84, 204 Institute of Technical Cybernetics, 170 Leningrad Institute of Informatics and Instruction flow, 6 Information, 182 Intel Corp., 156, 195, 199, 213, 223, 257 Leopard,24 Intel i860, 72 Leopard-I,II, 205 Intel's iAPX, 53 LINKS-I, 29, 205 Interconnection Network, 13 Link 226 International Business Machine Corp., 146, Local memory systems, 69 190, 196, 197, 205, 213, 236, 250, 255, Lockheed Palo Alto Research Laboratory, 256, 257, 258, 259 152 International Computer Scientific Institute, Logarithmically structured transfer, 13, 80 235 Logic, 8 International Conference on Parallel Logica Space and Defence Systems, 174 Processing, 18 Long access time, 87 International Parallel Machines, 198 Loral Instrumentation, 204 Interstate Electronics, 234 Los Alamos National Laboratories, 88, 233 InterSystems, 87, 176 LSM, 23, 32, 205 IP, 28, 33, 198 LST,81 IP-l, 25, 198 LUCAS, 23, 42, 206 IPSC, 26, 104, 112 Lund University, 206 IPSC iPSC/2 &iPSC/860, 199 IRECE,249 M3, 24, 206 ITT Advanced Technology Center, 200 M5pS, 24, 206 ITT CAP, 23 M-l,29 ITT Intermetall, 172 M-l Cellprocessor, 207 IUA, 30, 50, 200 M-1800, 27, 207 IWARP, 257 M64, 26, 58, 207 IXM, 30, 201 Macro dataflow, 16, 117 IXM2, 30, 201 MACSYM, 29, 33, 208 MAGNUM, 25, 208 J-Machine, 26, 213 Mago,43 Johns Hopkins Hospital Applied Physics Makbilan, 25, 208 Laboratories, 234 Manchester Dataflow Computer, 25, 84, 209 Johnson, 8, 69 Manchester University, 52, 186, 216, 224 MANJI-II, 30, 209 K2, 23, 202 MAPLE, 29, 209 Keio University, 153,209,242 MaRS, 30, 55, 210 Keller, 120 MARS-M, 27, 210 Kendall Square Research, 202 Martin Marietta, 154, 189 Kobe University, 223 MasPar, 55, 73 KRPP, 203 MasPar Computer, 214 KSR1, 27, 66, 69, 84, 90, 202 Masscomp, 210 Kuck,7 Masscomp 5700, 6700, 27, 210 Kyoto University, 162, 208 MATP, 26, 102, 109 Kyushu Reconfigurable, 25 MATP Real-Time III, 49, 211 Kyushu University, 203 Max Plank Plasma Physics Institute, 216 Kyushu University Reconfigurable Parallel MC860VS, 25, 211 Processor, 63, 203 Megaframe Supercluster, 24, 103 Megaframe Supercluster - 64 & 256, 211 Landmark II, 25, 203 Meiko,163 LAP, 28, 42, 203 Melbourne Decouple Multicomputer Latency, 13 Architecture, 212 LAU, 25, 26, 50, 204 Melbourne Decoupled, 23 Lawrence Livermore National Laboratories, Memory Organization and Addressing, 14 49, 237 Mercury Computer Systems, 211 266 Index

Message-passing, 39, 62, 63 NERV, 30, 219 Micro-El'brus, 178 New York University, 253 MIDAS, 25, 32, 212 NITsEVT, 182 MIMD, 7,11 Non-control-flow mechanisms, 50 Minerva, 24, 212 Non-Von, 30, 50, 55, 59 MISD,7 Non-Von-1,3,4, 220 MIT, 52, 163, 213, 214 Nonblocking, 14 MIT Lincoln Laboratories, 217, 227 Nonorthogonality, 7 MITE, 29, 213 NonStop, 107 MITRE,177 NonStop Series (TXP, CLX, VLX), 247 Mitsubishi, 193 Norsk-Data, 220 Mitsui,225 Norsk-Data ND-5900, 27 Mk III, 120, 253 North Carolina State University, 156, 157 Model for signal processing, 100 North Carolina University, 185 MODULOR, 24, 213 Norwegian Defense Research Establishment, Monarch, 156 160 Monsoon, 26, 214 Norwegian Institute of Technology, 167, 193 MP-1, MP-2, 214 NP-1, 27, 220 MP-1,2,26 NTT, 146, 174, 177,234 MPH, 25, 215 NTT Communication and Information MPP, 28, 42, 55, 106, 215 Laboratories, 230 MPS, 23, 215 NTT Electrical Communications MU6V, 26, 52, 216 Laboratories, 191 Multi-PSI, 30, 216 Number and Type of Processors, 12 Multi-VLIW,57 Numeric applications, 32 Multibus II, 48 NYU's Ultracomputer, 101 Multicomputers, 17 Multiflow Computer Inc., 252 OKI Electric Industry, 173, 227 Multiflow Corp., 50 OMEN 60, 61, 84, 221 Multimax Series, 181 ONERA-CERT, 213 MultiMicro, 24, 216 Osaka University, 205 Multiple VLIW, 51 OSCAR, 27, 221 Multiprocessors, 17 OUPPI-1, 23, 221 Multistaged interconnection networks, 80 MULTITOP, 23, 216 P~PS,63,81,88, 108,233 MUMS, 24,217 P1,224 MUNAP, 25, 217 PACE, 27, 222 MUSE, 27, 217 Packet switching, 13 MXCL-5,169 PACS Series, 225 Myrias 4000, 25, 101, 111, 119, 218 PADMAVATI, 30, 222 Myrias Research, 218 Palmer, John, 112 PAPIA, 28, 37, 50, 59, 106, 222 NAS, 218 Paradigm, 119, 119 NAS AS9080, 27, 218 Paragon, 27, 120 NASA Langley Research Center, 184, 187 Paragon XPIS, 223 National Aeronautical Laboratory, 187 Paralex Gemini; Pegasus, 27 National Physics Laboratory, 203 Paralex Research, 223 National University of Mexico, 147 Parallel processing, 5 NCR,189 PARK, 30, 223 NCUBE, 27, 103, 112, 219 ParSiFal, 24, 77, 103,224 NCUBE/x, nCUBE2/x Series, 219 Parsytec, 211, 224 ND-5900 Series, 220 Parsytec GC and GCel, 224 NEC, 194, 246 Parsytec's Megaframe Supercluster, 77 NEC C&C Systems Research Laboratory, 248 Parwell-1, 24, 224 NEC SX-1, 11 PASM, 28, 37, 50, 61, 225 NEC SX-2, 110 Pattern Driven, 8 Index 267

Paul Sabatier University, 246 R-256, 27, 234 Pavia University, 222 RAP, 28, 30, 50, 55, 105, 234, 235 PAX, 27, 225 Raytheon Corp_, 234 PC/M,195 RCA,235 PCLIP, 28, 50, 226 RCA-215, 25, 235 Peak performance, 111 Rediflow, 120 Pegasus, 103, 223 Reduced traffic, 120 PEPE, 28, 226 Research Institute for Information Processing Period of Construction, 15 and Pattern Recognition, 186 Philips, 50, 175 Ring Array Processor, 235 Philips Forshungslaboratorium, 244 RISC processors, 117 Philips Research Laboratories, 230, 251 RMIT, 26, 235 PICAP II, 28, 226 Royal Melbourne Institute of Technology, 235 PIM,39 RP3, 15, 27, 236 PIM-D, 30, 33, 227 RPA, 23, 48, 52, 236 PIM-R, 30, 33, 39, 84, 227 RSRE, 146, 174, 237 PIP, 28, 227 RST, 24, 236 PIPE, 29, 227 RTP, 24, 77, 102, 103, 237 Pipelined vector processors, 17 Pixel-Planes 5, 29, 228 S-I, 12, 26, 49 PIXI~5000, 29, 42, 228 S-1 (Mark-I,IIA), 237 Pleiades, 30, 228 Sanders Associates, 221 PLURIBUS, 25, 32, 228 Saxpy Computer, 238 PLUS, 25, 229 Saxpy Matrix-I, 12, 26, 48, 238 ·PM2I, 13, 81 Scalability, 118 Polyp, 24, 32, 75, 229 Scalable POWERparallel System, 197 POMP, 29, 229 Schlumberger,50 POOMA, 29, 230 Schlumberger Palo-Alto Research, 183 Power Visualization System, 196 Scientific Research Institute of Control PPS, 29,230 Computing Systems, 232 Presto, 30, 230 Scientific-Research Centre for Electronic Price/performance, 32 Computer Technology, 182 PRIME, 24, 231 SDFM, 26, 238 Princeton Engine, 28, 231 Seitz, 17 Pringle, 24, 231 Semantic Network Array Processor, 242 Private Memory, 8 Sequent, 37, 101, 110, 238, 239 PRODIGY, 24, 232 Sequent Balance, 24 Production System Machine, 30, 232 Sequent Symmetry, 24, 25 Programmable Image Processor, 227 Sequoia, 25, 111 Prototype Pyramid, 57 Sequoia Series 400, 239 Prototype Pyramid Machine, 226 Sequoia Systems, 239 PS-2000, 2100, 232 Severodonetsk Institute Building Plant, 232 PS-2000/21OO, 26 SGI 4D-MP, 29, 239 PSC, 167 Shared Memory, 8 PSM, 29, 232, 233 Shared-memory, 39, 62 PUPS, 25, 233 Shuffle-Shift Machine, 244 Purdue University, 225, 231 Siberia, 26, 240 PVS, 196 SiDBM, 29, 30, 107,240 PX-l, 28, 233 Siemens, 156, 242 Sigma-I, 27, 241 SIGMA-9, 24, 241 QCD machine, 52, 163 Signal processing, 33 Quadtree, 106 Silicon Graphics, 121 Quadtree pyramid, 13 Silicon Graphics Computer Systems, 239 Quantum chromodynamics, 11,32 Silicon Graphics' 4D-MP, 35 QUEN 16, 234 SIMD,7 268 Index

SIMD Processor Development System, T -Series, 187 243 Taganrog Radio Engineering Institute, 182 SISD,7 Taiwan National University, 27 SKY Computers Inc., 241 Taiwan National University Hypercube, 247 SKYbolt, 29, 35 Tandem Corp., 247 SKYbolt-mp, 241 Tandem NonStop, 29, 37 Slotnick, 1 Taxonomy, 6 Smart memory modules, 54 TC2ODO,155 Smart networks, 120 TDM, 29, 37, 50, 59, 61, 248 SMS 201, 27, 32, 242 Teradata, 107,248 SNAP-I, 30, 242 Teradata Database Computer DBC/1012, 29, Snoopy caching, 63 37, 42, 84, 248 SOLOMON, 1, 104 Terra, 90 Southampton University, 48 Terrano, 32 SP1, 197 Texas Instruments, 173 SPARC Server, 25 TF-1, 120 SPARCserver 10, 600MP, 243 The Cellular Computer, 43 SPARCstation 10, 243 The Institute for Research in Cosmic Physics SPDS, 28, 243 and Relative Technology, 193 Special-purpose machines, 32 Thinking Machines Inc., 164, 165 Sperry Rand, 254 Thomson CSF, 160 Splash, 30, 243 Thread, 6 SPS-3,218 TIP-3, 29, 33, 84, 248 SPUR, 30, 244 Titan, 29 SRC,243 Titan 11,111, 249 SSM, 29, 244 TN, 28, 249 , 171,212 Tokushima University, 165 STARAN, 28, 105 Tokyo Institute of Industrial Science, 184 STARAN-B,E, 244 Tokyo Institute of Technology, 233 Stardent 1500, 3000, 249 TOMP, 24, 249 Static interconnection networks, 75 Tools, 118 Static networks, 14, 43 TOP-I, 29, 250 STC,146 Topologix, 250 Stollman, 238 Topology 100, 24, 77, 102, 103, 250 Stonefield Systems, 161 TOPSTAR, 29, 250 Stream, 6 TOPSY, 23, 251 String Reduction, 8 Torino Polytechnic, 249 Sun Microsystem Computer Corp., 243 Toroidal Net, 249 SupercJuster, 211 Torus, 23, 251 Supercomputer, 5, 71 Toshiba, 256 Supercomputing, 109 Toshiba R&D, 232 SuperSet Plus, 27 Touchstone, 223 SuperSet Plus family, 245 TRAC, 23, 50, 61, 251 Support environment, 118 TRACE/5OD, 26 Suprenum, 26, 75 TRACE/500 Family, 252 Suprenum-1,245 Transition Machine, 25, 252 Swiss Federal Institute of Technology, 180, Transputer, 64, 71, 77, 80 202, 206, 246 Treleaven, 7, 11 SX-3, 26, 246 Tridex,184 SY.MP.A.T.I 1, 28 Trusted Multiple Microcomputer, 252 SY.MP.A.T.I. 1 & 2,246 TRW, 253 SYDAMA, 29, 246 TRW Mark III, 43 Symmetry S81a & S81b, 239 TRW Mk III, 30, 253 Synapse, 107 Tumult-X, 23 Synapse N+1, 247 Tumult-x, 253 Systolic arrays, 11 Turin Polytechnic, 222 Index 269

Thrin University, 222 VFPP,52 Twente University, 253 Victor, 23, 77, 103, 255 Type of Constructing Institution, 15 Virtual processors, 119 Visionary Systems, 153 Ultracomputer, 25, 66, 253 VMEbus,48 UMIST,168 VOLVOX, 24 Unger, 1 VOLVOX TS-8oo, IS-860, 255 UNIVAC, 27 Von Neumann, 8 Univac 1100 family, 254 VP-20OO, 255 Univac 1100 series, 101 VP-2xOO,27 University College, 161, 191 VPP, 28, 37, 256 University of Adelaide, 205 Vulcan, 24, 256 University of California, Berkeley, 212, 231, 244 Waferscale Associative String Processor, 154 University of California, Irvine, 147 WAP, 28, 257 University of Erlangen, 175, 178 WARP, 37, 257 University of Heidelberg, 219 WARP, iWARP, 29 University of Illinois, 52, 150, 160, 197 Waseda University, 221 University of Illinois at Urbana, 92 Washington University, 231 University of Kent, 228 WASP, 154 University of Lund, 217 Wave Tracer, 172 University of Manchester, 209 Wavefront Array Processor, 12, 77, 257 University of Maryland, 259 Weitek chips, 111 University of Massachussettes, 200 Wheel of reincarnation, 116 University of Melbourne, 212 WIPRO, 187, 203 University of North Carolina, 159, 228 Write-back caching scheme, 111 University of Paderborn, 170 WRM,23, 32,43, 257 University of Pisa, 236 University of Provence, 221 X-MP,50 University of Southampton, 236 Xl, 30, 258 University of Southern California, 242, 257 Xerox Data Systems, 241 University of Texas, 251 Xerox PARC, 176 University of Tokyo, 233, 250 XMP, 166 University of Toronto, 185, 194 XTM, 24,77,103,258 University of Tsukuba, 201,225 University of Washington, 226 University of Wisconsin, 175, 189 Yale University, 178 USSR Academy of Science, Siberian YMP, 166 Division, 240 YSE, 23,32,57, 104, 258 Utsunomiya University, 217 Yuppie, 28, 259

VAX-6240, 25, 254 Zagorsk Electro-Mechanical Factory, 178 VAX-9OO0, 23, 254 ZMOB, 30, 259 270 Index

WARP, 37, 257 WARP, iWARP, 29 Waseda University, 221 Washington University, 231 WASP, 154 Wave Tracer, 172 Wavefront Array Processor, 12, 77, 257 Weitek chips, 111 Wheel of reincarnation, 116 WIPRO, 187, 203 Write-back caching scheme, 111 WR~, 23,32, 43, 257

X-~P, 50 Xl, 30, 258 Xerox Data Systems, 241 Xerox PARC, 176 X~P, 166 XT~, 24, 77, 103, 258

Yale University, 178 Y~P, 166 YSE, 23,32,57, 104, 258 Yuppie, 28, 259

Zagorsk Electro-~echanical Factory, 178 Z~OB, 30, 259