US005963745A Ulllted States Patent [19] [11] Patent Number: 5,963,745 Collins et al. [45] Date of Patent: Oct. 5, 1999

[54] APAP I/O PROGRAMMABLE ROUTER 0208497 6/1986 European Pat. Off. . 208457A2 6/1986 European Pat. Off. . [75] Inventors: Clive Allan Collins, Poughkeepsie; _ _ Michael Charles Dapp, EndWell; (L1St Con?rmed 0“ next Page) James Warren Die?'enderfer; David Christopher Kuchinski, both of OTHER PUBLICATIONS OYVegO; Bllly Jack KHOWIEQ KlPgstons CK. Baru and S.Y.W. Su, “The Architecture of SM3: A Rlchard Eelward Nler> Apalachm> an Dynamically Partitionable Multicomputer System”, IEEE (éf biY$PEni{Et‘)lgetn; gi?iiFlivagen Transactions on , vol. —35, No. 9, pp. 790—802, en er, a.; 0 er els 1c ar son, Sep 1986 gisrtlil’ qigizgrllil?rilg?nlzgg‘zrgest S.P. Booth et al., “An Evaluation of the Meiko Computing EndWZ’H ’ Surface for HEP Fortran Farming’i”, Physics ’ ' ' Communications 57, pp. 486—491, 1989. [73] Assignee: 22:;2333321glgggeljsglxichines (List continued on next page.) Primary Examiner—Meng-Ai T. An [21] Appl' N05 08/430,114 Assistant Examiner—John Follansbee [22] Filed. Apt 27’ 1995 Attorney, Agent, or Firm—Lynn Augspurger; Morgan & Finnegan, L.L.P.

[63] (fntcilnuatcilonRelated °hf_ agRlication U-S. Application N9 07/887,258, Data Mfay 221s 1992, Aparallel array processor for massively parallel applications 112110077211 f?“‘i%‘é%“°§rl§li§’§ii§ shore: s sssss with lsw sswss @405 with DRAM .sssssssss continuation-in-part of application No. 07/798,788, Nov. 27, W_h11e meorporatmg proeessmg elemenFs on a _S1ng1e ch11?‘ 1991, abandoned, Eight processor memory elements on a single chip have their 6 oWn associated processing element, signi?cant memory, and [51] Int. Cl...... ~G061?‘ 15/16 I/O and are interconnected With a hypercube based, but [52] US. Cl...... 395/80013, 395/8001, modi?ed, topology‘ These nodes are then interconnected, _ 395/800'12’ 395/800'14 either by a hypercube, modi?ed hypercube, or ring, or ring [58] Fleld of Search ...... 395/800, 820, Within ring network topology The architecture uses an the 395/800~1> 800~12> 800~13> 800~14> 200~68> pins for networking. Each chip has eight 16 bit processors, ZOO-73> ZOO-79> 80011; 364/DIG- 1s BIG 2 and eight respective 32K memories. I/O has three internal _ ports and one external port shared by the plural processors [56] References Clted on the chip. Signi?cant software ?exibility is provided to Us PATENT DOCUMENTS enable quick implementation of existing programs Written in common languages. The scalable chip has internal and 3,537,074 10/1970 Stokes et a1...... 340/172 external Connections for broadcast and asynchronous SIMD, FEW}; 6‘ a1- - MIMD and SIMIMD (SIMD/MIMD) With dynamic switch 4’1O1’96O 72978 5:22;: ' 36/4060 ing of modes. A fully distributed programmable router is ’ ’ ' """"""""""""" " provided by the processing memory elements that form a (List continued on next page.) node. There is program compatibility for the fully scalable s stem. FOREIGN PATENT DOCUMENTS y 0132926 2/1985 European Pat. Off. . 14 Claims, 24 Drawing Sheets

EXTERNAL +w +x +y +1 PORTS 4

PE PE PE (*W) (+X) (+y) CMD/INST —— I ‘__l

_——SERREO F'CAST :. gs;u FACE ' ‘J '7

SERIAL LOOPS (-w)

EXTERNAL PORTS 5,963,745 Page 2

U.S. PATENT DOCUMENTS 4,992,933 2/1991 Taylor ...... 364/200 5,005,120 4/1991 Rn61Z . 364/200 4,107,773 8/1978 Gilbreath er al------364/200 5,006,978 4/1991 NeClleS ...... 364/200 4,270,170 5/1981 R6dd6w6y ...... 364/200 5,008,815 4/1991 H1ll1S ...... 364/200 4,314,349 2/1982 Batcher ...... 364/716 5,008,882 4/1991 Peterson et al. 370/94.3 4,338,675 7/1982 Palmer et al. . .. 364/748 5,010,477 4/1991 Omoda et al. .. 364/200 4,380,046 4/1983 Fung ...... 364/200 5,016,163 5/1991 JCSSllOpC 6161. 364/200 4,394,726 7/1983 Kohl ...... 364/200 5,020,059 5/1991 Gorin etal 371/11-3 4,412,303 10/1983 B61n6s 6161...... 364/900 5,021,945 6/1991 Morrison 91 al 364/200 474357758 3/1984 Lorie et aL _ " 364/200 5,038,282 8/1991 Gilbert et al. .. . 364/200

4,468,727 8/1984 Carrison"u' .." 364/200 510411189 8/1991 Tamltamnu‘ . . . . .1..." -- ...- 364/200 4,498,133 2/1985 Bolton 61 61...... 364/200 218213;; 31331 (Lhwegilelt ‘13- l" - 322/588 45234’598’400 273 619857Z9,“ AdH?‘ligls’ 111 e 1 a 1...... 364378/60 200 5,047,9171 1 9/1991 A111666161.CV1“ ‘1 e ‘1' ~ .. 395/500 4,604,695 8/1986 Widen 6161...... 364/200 21029982 9/1991 Lie 6‘ al- - 357/81 4,621,339 11/1986 W6gn616161...... 364/200 10 61000 10/1991 C “5 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ " 364/200 4,622,650 11/1986 KllliSCll ...... 364/748 510721217 12/1991 Gemgw“ 6‘ al- 340/825-79 4,706,191 11/1987 H6n16116 6161...... 364/200 511131523 5/1992 C911” 6‘ ‘11' 395/800 4,720,780 1/1988 Do1666k ...... 364/200 511211498 6/1992 (Elbert 6‘ ‘11' ~ 395/700 477367291 4/1988 Jennings et a1~ “ 364/200 2,136,282 8/1992 Frlroozmand 370/85.1 4,739,474 4/1988 HolsZtynski ...... 364/200 1142’ 40 8/1992 Gflsfer 37mm 4,739,476 4/1988 F1dn6616 ...... 364/200 511461608 9/1992 H9115 - 395/800 4742 511 “988 Johnson __ __ 37O/4O6 5,165,023 11/1992 Gifford 395/325 477487585 5/1988 (31116111111 ...... 364/900 511701482 12/1992 Sh“ 6‘ a1~~ 395/800-12 4,763,321 8/1988 c61v1gn66 6161...... 370/94 511701484 12/1992 GOrOdalskl ~~ 395/800 477807873 @1988 Mattheyses 5,173,947 12/1992 c116nd6 6161. .. 382/41 4783 738 11/1988 L16161...... 364/200 511751862 12/1992 Phelps etal- - 395/800

4783782 11/1988 Morton ...... 371/11 511751865 12/1992 H?hs ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ " 395/800 478057091 2/1989 Thiel et a1 5,181,017 1/1993 Frey, Jr. et al. 340/825.02 ’ ’ ' 5 187 801 2/1993 ZeIliOS 6161. 395/800 4,809,159 2/1989 sow6 ...... 364/200 > > _ 4809 169 2/1989 8161116161...... 364/200 511891665 2/1993 Nlehaus 6‘ ‘1T ~ ~ 370M581 478097347 2/1989 Nash 6161. 382/49 511971130 3/1993 Chen etal- - 395/325 4,814,980 3/1989 P61616on 6161...... 364/200 512121773 5/1993 H111“ ~~~~~~~~~ ~~ 39500073 4,825,359 4/1989 0111<6n116161 ...... 364/200 512121777 5/1993 Gove etal 395/570 4 831 519 5/1989 Morton ~~~~~ n " 5,218,676 6/1993 Ben-Ayed et al. 395/200 4835729 5/1989 Morton ...... 364/200 512181709 6/1993 Fil'any 6‘ al - 395/800 4,841,476 6/1989 M116116116161...... 364/900 512301079 7/1993 Grf’ndalskl " " 395/800 478477755 7/1989 Morrison et a1‘ _ __ 5,239,629 8/1993 M111616161. 395/325 4 849 882 7/1989 Aoy61n6 6161...... 364/200 512391654 8/1993 lng'simmns 6‘ al- - 395/800 4’852’048 7/1989 Morton 364/200 5,251,097 10/1993 Simmons et al...... 361/687 478557903 8/1989 Carleton et a1‘ 364/200 5,253,359 10/1993 SprX et al...... 395/575 4858110 8/1989 Miyata ~~~~~ __ 364/200 5,265,124 11/1993 8166116161...... 375/3 4’86O’2O1 8/1989 stolfo et al' 364/200 5,280,474 1/1994 NiCkOllS 6161...... 370/60 478727133 “V1989 Leeland “ 364/748 5,297,260 3/1994 Kametani ...... 395/325 478737626 @1989 Gifford __ 364/200 5,355,508 10/1994 K6n ...... 395/800 478917751 M990 can et a1‘ 395/8O006 5,367,636 11/1994 co116y 6161...... 395/200

4891 787 M990 Gifford ______395687 5,379,440 1/1995 K611y 6161...... 395/800.12 4,896,265 1/1990 Fiduccia et al. 364/200 4,901,224 2/1990 Ew611 ...... 364/200 FOREIGN PATENT DOCUMENTS 419021260 2/1990 BOSE‘? iFal- ~l 370/60 340668A2 4/1989 E1nop66n P61. 011. 4190 1143 2/1990 Ta ‘1 as 1 e‘ ‘1' 364/200 428327A1 11/1990 E1nop66n P61. 011. 4,907,148 3/1990 Mortlon ...... i. 364/200 429733A2 @1991 European Pat Off‘ 4,910,66 3/1990 Matt eyses et a . 364/200 460599A3 12/1991 European Pat‘ Off‘ 4,916,622 4/1990 s611w61Z ...... 364/748 485690A2 5/1992 European Pat Off‘ 213812847‘ E338 11:10“? 322/388 493876A2 7/1992 E1nop66n P61. 011.. 1 1 / an‘? e / 2223867 4/1990 Un116d Kingdom. 4,922,408 g/1990 Davlrls et all 364/200 89/09967 4/1988 WIPO _ 4,92 ,311 /1990 Nee es eta. 364/200 92/06436 4/1992 WIPO' 4,933,846 6/1990 Humphrey et al. 364/200 4,933,895 6/1990 Grinberg et al. . 364/748 4,942,516 7/1990 Hy611 ...... 364/200 OTHER PUBLICATIONS 4’943’9124 942 517 7;19907 1990 Agymac k ...... et .. al 364200364 200 S.P. Booth et al., “Large Scale Applications. . of 479567772 9/199O Neches ______364/200 in HEP: The Edinburgh Concurrent 4,958,273 9/1990 Anderson 6161. 364/200 Project”, Computer Physics Communications 57, PP 4,964,032 10/1990 Smith 364/200 101—107, 1989. gawis 1 P. Christy, “Software to Support Massively Parallel Com , , u e a - 11 4 984 237 1/1991 Franaszek " 370688 putrng on the MasPar MP—1 , 1990 IEEE, pp. 29—33. 4,985,832 1/1991 Grond61ski, __ 364/200 S.R. Colley, “Parallel Solutions to Parallel Problems”, 4,992,926 2/1991 Janke et al...... 364/134 Research & Development, pp. 42—45, Nov. 21, 1989. 5,963,745 Page 3

E. DeBenedictis and J .M. del Rosario, “nCUBE Parallel I/O Boubekeur et al., “Con?guring AWafer—Scale Two—Dimen Software”, IPCCC ’92, 1992 IEEE, pp. 0117—0124. sional Array of Single—Bit Processors”, Computer, vol. 2, T.H. Dunigan, “Hypercube Clock Synchronization”, Con Issue 4, Apr. 1992, pp. 29—39. currency: Practice and Experience, vol. 4(3), pp. 257—268, Korphiharju et al., “TUTCA Con?gurable Logic Cell Array May, 1992. Architecture” IEEE, Sep. 1991, pp. 3—3.1—3—3.4. TH. Dunigan, “Performance of the iPSC/860 and C.K. Baru and S.Y.W. Su, “The Architecture of SM3: A Ncube 6400 hypercubes *”, 17, pp. Dynamically Partitionable Multicomputer System”, IEEE 1285—1302, 1991. Transactions on Computers, vol. C—35, No. 9, pp. 790—802, DD. Gajski and J .K. Peir, “Essential Issues in Multiproces Sep. 1986. sor Systems”, 1985 IEEE, pp. 9—27, Jun. 1985. SP. Booth et al., “An Evaluation of the Meiko Computing A. Holman, “The Meiko Computing Surface: A Parallel & Surface for HEP Fortran Farming*”, Computer Physics Scalable Open Systems Platform for Oracle”, A Study of a Communications 57, pp. 486—491, 1989. Parallel Database Machine and its Performance—The NCR/ SP. Booth et al., “Large Scale Applications of Transputers Teradata DBC/1012, pp. 96—114. in HEP: The Edinburgh Concurrent Supercomputer J.R. Nickolls, “The Design of the MasPar MP—1: A Cost Project”, Computer Physics Communications 57, pp. Effective Massively Parallel Computer”, 1990 IEEE, pp. 101—107, 1989. 25—28. P. Christy, “Software to Support Massively Parallel Com J .F. Prins and J .A. Smith, “Parallel Sorting of Large Arrays puting on the MasPar MP—1”, 1990 IEEE, pp. 29—33. on the MasPar MP—1*”, The 3rd Symposium on the Fron S.R. Colley, “Parallel Solutions to Parallel Problems”, tiers of Massively Parallel Computation, pp. 59—64, Oct., Research & Development, pp. 42—45, Nov. 21, 1989. 1990. JR. Nickolls, “The Design of the MasPar MP—1: A Cost J .B. Rosenberg and J .D. Becher, “Mapping Massive SIMD Effective Massively Parallel Computer”, 1990 IEEE, pp. Parallelism onto Vector Architectures for Simulation”, Soft 25—28. ware—Practice and Experience, vol. 19(8), pp. 739—756, J .F. Prins and J .A. Smith, “Parallel Sorting of Large Arrays Aug., 1989. on the MasPar MP—1*, The 3rd Symposium on the Frontiers J .C. Tilton, “Porting an Iterative Parallel Region Growing of Massively Parallel Computation”, pp. 59—64, Oct., 1990. Algorithm from the MPP to the MasPar MP—1”, The 3rd J .B. Rosenberg and JD. Becher, “Mapping Massive SIMD Symposium on the Frontiers of Massively Parallel Compu Parallelism onto Vector Architectures for Simulation”, Soft tation, pp. 170—173, Oct., 1990. ware—Practice and Experience, vol. 19(8), pp. 739—756, “ Balance and Symmetry Aug. 1989. Series”, Faulkner Technical Reports, Inc. pp. 1—6, Jan., J .C. Tilton, “Porting an Interative Parallel Region Growing 1988. Algorithm from the MPP to the MasPar MP—1”, The 3rd “Symmetry 2000/400 and 2000/700 with the /ptx ”, Sequent Computer Systems Inc. Symposium on the Frontiers of Massively Parallel Compu “Symmetry 2000 Systems—Foundation for Information tation, pp. 170—173, Oct., 1990. Advantage”, Sequent Computer Systems Inc. “Sequent Computer Systems Balance and Symmetry “Our Customers Have Something That Gives Them an Series”, Faulkner Technical Reports, Inc., pp. 1—6, Jan., Unfair Advantage”, The nCUBE Parallel Software Environ 1988. ment, nCUBE Corporation. “Symmetry 2000/400 and 2000/700 with DYNIX/ptx Y.M. Leung, “Parallel Technology Mapping With Identi? Operation System”, Sequent Computer Systems Inc. cation of Cells for Dynamic Cell Generation”, Dissertation, “Symmetry 2000 Systems—Foundation for Information Syracuse University, May 1992. Advantage”, Sequent Computer Systems Inc. “The Connection Machine CM—5 Technical Summary”, “Our Customers Have Something That Gives Them an Thinking Machines Corporation, Oct. 1991. Unfair Advantage”, The nCUBE Parallel Software Environ Siegel et al., “Using the Multistage Cube Network Topology ment, nCUBE Corporation. in Parallel Supercomputer,” Proceeding of the IEEE, Dec. Y.M. Leung, “Parallel Technology Mapping With Identi? 1989, pp. 1932—1953. cation of Cells for Dynamic Cell Generation”, Dissertation, Struble. “Assembler Language Programming: The IBM Sys Syracuse University, May 1992. tem/360 & 370.” 2”“ Ed., Reading MA: Addison—Wesley “The Connection Machine CM—5 Technical Summary”, Pub. Co. (1975) pp. 310—313. Thinking Machines Corporation, Oct., 1991. T.A. KriZ and M.J. Marple, “Multi—Port Bus Structure With Fineberg et al., “Experimental Analysis of a Mixed—Mode Fast Shared Memory”, IBM Technical Disclosure Bulletin, Parallel Architecture Using Bitonic Sequence Sorting”, vol. 27, No. 10A, pp. 5579—5580, Mar. 1985. Journal of Parallel And Distributed Computing, Mar. 1991, HP. Bakoglu, “Second—Level Shared Cache Implementa pp. 239—251. tion For Multiprocessor Computers With A Common Inter T. Bridges, “The GPA Machine: A Generally Partitionable face For The Second—Level Shared Cache And The Sec MSIMD Architecture”, The 3rd Symposium on the Frontiers ond—Level Private Cache”, IBM Technical Disclosure of Massively Parallel Computation, Oct. 1990, pp. 196—203. Bulletin, vol. 33, No. 11, pp. 362—365, Apr. 1991. Abreu et al., “The APx Accelerator”, The 2nd Symposium Mansingh et al., “System Level Air Flow Analysis for a on the Frontiers of Massively Parallel Computation, Oct. Computer System Processing Unit”, Hewlett—Packard J our 1988, pp. 413—417. nal, vol. 41 No. 5, Oct. 1990, pp. 82—87. D.A. Nicole, “Esprit Project 1085 Recon?gurable Trans Tewksbury et al., “Communication Network Issues and puter Processor Architecture”, CONPAR 88 Additional High—Density Interconnects in Large—Scale Distributed Papers, Sep. 1988, pp. 12—39. Computing Systems”, IEEE Journal on Selected Areas in E. DeBenedictis and J .M. del Rosario, “nCUBE Parallel I/O Communication, vol. 6 No. 3, Apr. 1988, pp. 587—607. Software”, IPCCC ’92, 1992 IEEE, pp. 0117—0124. 5,963,745 Page 4

TH. Dunigan, Hypercube Clock Synchronization: Concur R. Duncan, “A Survey of Parallel Computer Architectures”, rency:. Practice and Experience, vol. 4(3), pp. 257—268, IEEE, Feb. 90 pp. 5—16. May 1992. C.R. Jesshope et al., “Design of SIMD TH. Dunigan, “Performance of the Intel iPSC/860 and Array”, UMI Article Clearing house, Nov. 88. Ncube 6400 hypercubes*”, Parallel Computing 17, pp. Sener Ilgen & Issac Schers, “Parallel Processing on VLSI 1285—1302, 1991. Associative Memory”, NSF AWard #ECS—8404627, pp. DD. Gajski and J .K. Peir, “Essential Issues in Multiproces 50—53. sor Systmes”, 1985 IEEE, pp. 9—27, Jun. 1985. H. Stone, “Introduction to Computer Architecture”, Science A. Holman, “The Meiko Computing Surface: A Parallel & Research Associates, 1975, Ch. 8, pp. 318—374. Scalable Open Systems Platform for Oracle”, A Study of a R. M. Lea, “WASP: A WSI Associative String Processor” Parallel Database Machine and its Performance—The NCR/ Journal of VLSI Signal Processing, May 1991, No. 4, pp. Teradata DBC/1012, pp. 96—114. 271—285. Baba et al., “A Parallel Object—Oriented Total Architecture: Lea, R.M., “ASP Modules: Cost—Effective Building—Blocks A—NET”, Proceedings Supercomputing, Nov. 1990, pp. for Real—Time DSP Systems”, Journal of VLSI Signal 276—285. Processing, vol. 1, No. 1, Aug. 1989, pp. 69—84. Mitchell et al., “Architectural Description of a NeW, Easily Isaac D. Scherson, et al., “Bit Parallel Arithmetic in a Expandable Self—Routing Computer Network Topology”, Massively—Parallel Associative Processor”, IEEE, V0. 41, IEEE Infocomm, Apr. 1989, pp. 981—988. No. 10, Oct. 1992. K. Padmanabhan, “Hierarchical Communication in Supreet Singh and Jia—Yuan Han, “Systolic arrays”, IEEE, Cube—Connected Multiprocessors”, The 10th International Feb. 1991. Conference on Distributed Computing Systems, May 1990, H. Richter and G. Raupp, “Control of a Tokamak Fusion pp. 270—277. Esperiment by a Set of Multitop Parallel Computers”, IEEE Fineberg et al., “Experimental Analysis of Communications/ vol. 39, 1992, pp. 192—197. Data—Conditional Aspects of a Mixed—Mode Parallel Archi Higuchi et al., “IXM2: A Parallel Associative Processor for tecture via Synthetic Computations”, Proceeding Supercom Semantic Net Processing—Preliminary Evaluation—”, IEEE, puting ’90, Nov. 1990, pp. 647—646. Jun. 1990, pp. 667—673. Kan et al., “Parallel Processing on the CAP: Cellular Array Frison et al., “Designing Speci?c Systolic Arrays With the Processor”, COMPCON 84, Sep. 16, 1984, pp. 239—244. API15C Chip”, IEEE 1990, xii+808pp., pp. 505—517. EZZedine et al., “A 16—bit SpecialiZed Processor Design”, Integration The VLSI Journal, vol. 6 No. 1, May 1988, pp. Berg et al., “Instruction Execution Trade—Offs for SIMD vs. 101—110. MIMD vs. Mixed Mode Parallelism”, IEEE Feb. 1991, pp. A. MudroW, “High Speed Scienti?c Arithemetic Using a 301—308. High Performance Sequencer”,Electro, vol. 6, No. 11, 1986, Raghaven et al., “Fine Grain Parallel Processors and pp. 1—5. Real—Time Applications: MIMD Controller/SIMD Array”, Alleyne et al., “A Bit—Parallel, Word—Parallel, Massively IEEE, May 1990, pp. 324—331. Parallel Accociative Processor for Scienti?c Computing”, G. J. Lipovski, “SIMD and MIMD Processing in the Texas Third Symposium on the Frontiers of Massive Parallel Recon?gurable Array Computer”, Feb. 1988, pp. 268—271. Computation, Oct. 8—10, 1990; pp. 176—185. R.M. Lea, “ASP: A Cost—effective Parallel Microcomputer”, Jesshoppe et al., “Design of SIMD Microprocessor Array”, IEEE Oct. 1988, pp. 10—29. IEEE Proceedings, vol. 136., May 1989, pp. 197—204. Mark A. Nichols, “Data Management and Control—FloW DeGroot et al., “Image Processing Using the Sprint Multi Constructs in a SIMD/SPMD Parallel Language/Compiler”, processon”, IEEE, 1989, pp. 173—176. IEEE, Feb. 1990, pp. 397—406. Nudd et al., “An Heterogeneous M—SIMD Architecture for Will R. Moore, “VLSI For Arti?cial Intelligence”, KluWer Kalman Filter Controlled Processing of Image Sequences”, Academic Publishers, Ch. 4.1. IEEE 1992, pp. 842—845. Mosher et al., “A SoftWare Architecture for Image Process Li et al., “Polmorphic—Torus Network”, IEEE Transactions ing on a Medium—Grain Parallel Machine”, SPIE vol. 1659 on Computers, vol. 38, No. 9, Sep. 1989, pp. 1345—1351. Image Processing and Interchange, 1992/279. Li et al., “Sparse Matrix Vector Multiplication of Polymor PatentAbstracts of Japan, vol. 8, No. 105, May 17, 1984, phic—Torus”, IBM Technical Disclosure Bulletin, vol. 32, p. 274, App. No. JP—820 125 341 (Tokyo Shibaura Denki No. 3A, Aug. 1989, pp. 233—238. KK) Jan. 27, 1984. Li et al., “Parallel Local Operator Engine and Fast P300”, W.D. Hillis, “The Connection Machine”, The MIT Press, IBM Tech. Disc. Bulletin, vol. 32, No. 8B, Jan. 1990, pp. Chapters 1, 3, and 4. 295—300. “Joho—syori”, vol. 26(3), 1985—3, pp. 213—225, (Japanese). U.S. Patent 0a. 5, 1999 Sheet 1 0f 24 5,963,745

ZIBUS .7IMANTISSA] FPU EXPDNENT‘r r_____1 ZEUS ‘

sIIIET NORMALIZE 1

A REG DIN BUS INTEREAGEDATA BUS DIN BUS A REG ? I DoDT BUS \ DDDT BUS A I GB REG :> I c8 REG ‘ AFPOPCODE y INSTRUCTIONSTREAMER /\——'\lDBUS‘f INTERFACE i1) INSTRUCTION

U CONFIGURATION B TIMERS REGISTER & U TIMING CONTROL \ S ALU EXTERNAL @ MEMORY INTERFACE /..I \,_ wNK U PTR REG INPUT DATA REG - LINK COUNT REG LOGIC U v w x PTR REG OUTPUT DATA REG —- UNK [LINKSI COUNT REG LOGIC U v‘w x LINK I LINK 2 LINK 3 LINKS v w Z, REGISTERSADDRESS INSTRUCTION FETCH ADDRESS _I<: CHANNEL ADDRESS DATA ADDRESS

M Prior Art U.S. Patent 0a. 5, 1999 Sheet 3 0f 24 5,963,745

0 Ln If)

000 00a 00000000 00000000 0000a 00 00000000 0000000 00000 00000000 0000000 ooooo 00000090 915 00000000 00000000 00000000 00000000 00000000

00000000 00000000 0 00000000 20000 0 0000 0008 0000000 00 0 000000000.) 0‘0000000 00 0 00000000 00000000 00 0 00000000 00000000 00 0 0000000 (0 10000000 000000 \0000000 F _ 00000000 00000000 00000000 . -— 00000000 00000000 00000000 0 _‘ cooooooo oooo 000 00000000 _

_— >- _— 00000000 00000000 00000000 _ < _ 00000000 00000000 00000000 :._ Q1C! 2*,_ oooocooo booooooo oooooooo < 00000000 $00000°0 00000000 E Lu E 3:33:33: 3:32:22: 3:32:22: .- ooooocoo 09000000 00000000 : < "" O 00000000 00000000 00000000 _ O I _ 00000000 00000000 oocooooo L 00000000 00000000 00000000 : m : aooooaoo ooooooco cooooooo _ O - __ 2 _ : Q :

: H

m 550 CD mjm'l-(f) XEQ:C) 5

C! Q4 0 Ll-'L*-'>—m Z-—'0:\_._1U_J

O '- N Z Z _ Z Z 2 Qt: Qt: Q0: Q0: 0-0 r-O r-O r-O <0‘!

@/ "5“0:52:58 Q.. 5%?U Q M9555‘ ! 1/ 34mmUmDU - ZEUSU nuI :33??? / U.S. Patent 0a. 5, 1999 Sheet 5 0f 24 5,963,745

200 \ .' .' 310 APPLICATION _ ARRAY N_ PROCESSOR O 250 ARRAY O2 - 210APEUCATION :- __\______ARRAY OIRECTOR __ _: , 300 PROCESSOR 1 |260\ 270\ | ARRAY O] I APPLICATION ARRAY I 290 22% I PROCESSOR CONTROLLER <9: ARRAY OO APPUCA‘HON I INTERFACE SYNCHRONIZER I W280 PROCESSOR 2 I I I I I I I . i i 230\ I '—------' APPLICATION 240\ Fl 5 PROCESSOR N TEST/DEBUG ___ OEvICE

"51%? HOST APPLICATION PROCESSOR “X35 I:PROGRAMS COMPILE EXECUTE ARRAY CTRLR PERFOSIQSEEEMQI DEMO APPLICATION HOST I’FACE 05/ LIB - M'TOR I'F PME ARRAY I’F “ARRAY RIsC/6OOO TEsT & DEBUG MONITOR

DEBUGGER _ PERFORMANCE MONITOR RS/6OOO & ANALYSIS TEsT & MPP OIACNOsTICs INTERFACE SIMULATOR SIMULATOR MONITOR ASSEMBLER/LINKER & LOAOER

FIG.19

U.S. Patent 041. s, 1999 Sheet 10 0f 24 5,963,745

64 64 64 64 KB KB K8 K8 +x<—~ PE PE 4» -x

CPU -~ CPU CPU CPU l I I _ ‘ ' ' +W<+ PE > PE =-w —~ NETWORK

CPU CPU CPU M CP u H“ PE ‘ PE '0 ‘Y

64 64 64 64 KB K8 K8 K8 +2“ PE PE *’ -Z

k V FIG.1O

EXTERNALPORTS +w3 +xt +y1 +21 L PE PE PE PE J <+w> <44)‘ (+y) (+2) CMD/INST ‘ ‘ ‘..__J

SERREO F'CAST ' INTER B'ST <—— . PE BUS L. —1 I ' H V —'—— \ PE PE PE PE SERIAL LOOPS ['H) (-X) (-y) (-1)? EXTERNAL I I i I PORTS —w —>< —y -1

U.S. Patent 0a. 5, 1999 Sheet 15 0f 24 5,963,745

HOW WOULD A 16 ELEMENT SORT REPEAT THE PATTERN? STAGE 1 2 3 456 7 e910‘

1000

/ (FOR SORTING n DATA ELEMENTS (n e 32%; e N,2' g # OF PE'SO do I = O to (logzn) — 1 doJ=Oto| if (PE#/2'— J) 75.2 0 then TARGET = PE#+2 - J else TARGET = PE#2 - J send DATA to TARGET TTOOJ receive data store in TEMP (if data is not available — woit) 1) 222) +1) 22 = 0

then if TEMP < DATA then DATA TEMP else NOP then if TEMP > DATA then DATA TEMP else NOP end both do's U.S. Patent 0a. 5, 1999 Sheet 16 0f 24 5,963,745

HOST _ HOST PROCESSOR _ A MEMORY

DATA AND COMMANDS T APPLICATION PROCESSOR INTERFACE API

ARRAY CONTROLLER COMMAND DATA /n DATA CLUSTER (U-CHANNEL) SYNCHRONIZER CS

Y I t T T (OPT'ONAL U CLUSTER CLUSTER CLUSTER CHANNEL CONTROLLER O CONTROLLER 1 CONTROLLER N DEVICES, DASD. CC CC - CC GATEWAYS'DISPLAYS, ~ ( p O RT ‘TY p E) ( N ON _p ORTED) ETC.) NORMAL T STATUS /64+P DATA BROADCAST /16+ MS‘E'LKKR PORT 8c STATUS lB'CAST' LOOP V CLUSTER O CLUSTER 1 CLUSTER N PME (512 PME's)