United States Patent [191 [11] Patent Number: 5,410,727 Jaffe et al. [45] Date of Patent: Apr. 25, 1995 [54] INPUT/OUTPUT SYSTEM FOR A 4,773,038 9/1988 Hillisetal. ........................ .. 395/500 MASSIVELY PARALLEL, SINGLE 4,783,738 11/1988 Li et al. 395/800 INSTRUCTION MULTIPLE D AT A (SIMD) 4,901,224 2/ 1990 Ewert ............. .. 364/200 5,081,575 1/1992 Hilleret a1. .... .. 395/325 SIMULTANEOUS TRANSFER OF DATA 5,136,717 8/1992 Morley et al. 395/800 BETWEEN A HOST COMPUTER 5,148,547 9/1992 Kahle et al. ....................... .. 395/800 INPUT/ OUTPUT SYSTEM AND ALL SIMD MEMORY DEVICES FOREIGN PATENT DOCUMENTS [75] Inventors: Robert S. Jaffe, Shenorock, N.Y.; 2160685 12/1985 United Kingdom . Hungwen Li, Monte Sereno, Calif; Margaret M. L. Kienzle, Somers, OTHER PUBLICATIONS N.Y.; Ming-Cheng Sheng, D. Parkinson et al., "The AMT DAP 500," Compcon _ Kaoshiung, Taiwan, Prov. of China 88-Thirty IEEE Computer Society International Con [73] Assignee: International Business Machines ference, San Francisco, Calif, Spring 1988, IEEE, New Corporation, Armonk, NY. York, pp. 196-199. Appl. No.: 157,232 Wiackless; Massively parallel computer for Digital Sig [21] nal and Image Processing; May 1989 IEEE. [22] Filed: Nov. 22, 1993 Roberts; Recent developments in parallel processing; Related US. Application Data IEEE. Primary Examiner-Alyssa H. Bowler [63] Continuation of Ser. No. 426,140, Oct. 24, 1989, aban doned. Assistant Examiner-L. Donoghue Attorney, Agent, or Firm—Scully, Scott, Murphy & [5 1] Int. Cl.6 ...................... .. G06F 5/06; G06F 13/12; G06F 15/16; G06F 7/00 Presser - [52] US. Cl. .................................. .. 395/800; 395/250; [57] ABSTRACT 395/275; 395/425; 364/DIG. 1; 364/231.9 A two-dimensional input/ output system for a massively [58] Field of Search . ............ .. 395/800, 375, 275, 425, parallel SIMD computer system'providing an interface 395/200 for the two-way transfer of data between a host com [56] References Cited puter and the SIMD computer. A plurality of buffers U.S. PATENT DOCUMENTS equal in number, and distributed with the individual processing elements of the SIMD computer are used to 3,287,703 11/1966 Slotnick ........................... .. 395/ 800 provide a temporary storage area which allows data in 3,936,806 2/ 1976 Batcher ..... .. .. 340/172 5 different formats to be mapped in a format suitable for 3,979,728 9/1976 Reddauay ......... .. 340/ 172.5 transfer to the host computer or for transfer to the 4,065,808 12/ 1977 Schomberg et al. 395/325 SIMD processing elements. The temporary storage is 4,101,960 7/1978 Stokes et al. .. 395/800 controlled in such a way as to transfer entire blocks of 4,380,046 4/1983 Prosch et al. 395/800 data in a single SIMD system clock cycle thereby 4,380,046 4/ 1983 Forsl et al. .. 395/800 achieving an input/ output data rate of N bits/cycle for 4,481,580 1l/1984 Martin et al. 395/325 a SIMD computer consisting of N processors. The sys 4,484,262 11/1984 Sullivan et al. 395/425 tem is capable of handling irregular as well as regular 4,514,807 4/ 1985 Nogi ....................... .. 364/200 data structures. [54] INPUT/OUTPUT SYSTEM FOR A MASSIVELY PARALLEL, SINGLE INSTRUCTION MULTIPLE D AT A (SIMD) SIMULTANEOUS TRANSFER OF DATA BETWEEN A HOST COMPUTER INPUT/ OUTPUT SYSTEM AND ALL SIMD MEMORY DEVICES

[57] ABSTRACT A two-dimensional input/ output system for a massively parallel SIMD computer system'providing an interface for the two-way transfer of data between a host com puter and the SIMD computer. A plurality of buffers equal in number, and distributed with the individual processing elements of the SIMD computer are used to provide a temporary storage area which allows data in different formats to be mapped in a format suitable for transfer to the host computer or for transfer to the SIMD processing elements. The temporary storage is controlled in such a way as to transfer entire blocks of data in a single SIMD system clock cycle thereby achieving an input/ output data rate of N bits/cycle for a SIMD computer consisting of N processors. The sys tem is capable of handling irregular as well as regular data structures. The system also emphasizes a distrib uted approach in having the input/output system di vided into N pieces and distributed to each processor to reduce the wiring complexity while maintaining the I/O rate. Pg“LLfE5:2.22 i m:52z.52 mz z: v A 14in)?52.522 ‘ _2? / 532% 1.:o2E //3 Go: 52:28 02 US. Patent Apr. 25, 1995 Sheet 6 of 6 5,410,727 2:»225:, 23mm 5,410,727 1 2 subsystem 30, typically comprises a staging memory INPUT/ OUTPUT SYSTEM FOR A MASSIVELY that is responsible for transferring data between the PARALLEL, SINGLE INSTRUCTION, MULTIPLE SIMD computer 10 and the host 20. DATA (SIMD) COMPUTER PROVIDING FOR THE In ?ne-grained, massively parallel SIMD systems, SIlVIULTANEOUS TRANSFER OF DATA 5 one single instruction after another is broadcast simulta BETWEEN A HOST COlVIPUTER INPUT/ OUTPUT neously to the processor array, with each instruction SYSTEM AND ALL SHVID MEMORY DEVICES being applied to different pieces of data. Traditionally, ?ne grained SIMD parallel systems This is a continuation of application Ser. No. 426,140 devoted their application emphasis to image-oriented filed on Oct. 24, 1989, now abandoned. 10 computing which resulted in the input/output system BACKGROUND OF THE INVENTION being designed only to handle regularly structured two dimensional data such as image or matrix data. The 1. Field of the Invention input/output rate of a SIMD computer system was This invention relates to an input/output system for typically low due to the fact that for a N-processor SIMD parallel computers, and more particularly, to a SIMD system, arranged as a VNXVNmesh, only VN distributed input/output system using a temporary stor items of data are input or output to or from the system age buffer, individual for each processing element of the per machine cycle. Most ?ne grained SIMD parallel SIMD computer, capable of providing a two-dimen sional data transfer scheme that substantially increases systems are connected by mesh networks and their input/output is done by shifting data between a host and the I/O rate of the SIMD system. 20 2. Discussion of the Prior Art one boundary row/column of the SIMD system. This Scientists and engineers from all disciplines have type of data transfer is considered one dimensional. In become dependent upon computers to further their addition, data must be pre-arranged by the host such work, and with this dependancy they have grown to that a particular datum can be assigned to a desired processor. The low input/output rate and restricted expect the performance of these computers to increase 25 by an order of magnitude approximately every ?ve capability in handling only regular data structures effec years. This trend of increasing computer performance tively con?ne SIMD computers to a narrow application in the order of magnitude range is slowing, in fact, the domain. supercomputers presently available may already be A second disadvantage of the mesh oriented row/ within an order of magnitude of their technological column shifting scheme used in the prior art SIMD limit. Heretofore, the limit was approximately 3 giga input/output systems is the difficulty in programming. ?ops which corresponds to approximately 3 billion Since the input/output function is overlapped with the ?oating point instructions per second, which is a func current task execution, the programmer must interleave tion of the length of time it takes electrical signals to . the instructions for computing with the instructions for propagate through various wires and interconnections input/ output. This situation may lead to a very unread at approximately one half the speed of light. The draw able code as well as force the programming to stay at back of the prior art system is that many of the problems the assembly language level. facing todays scientists and engineers can only be A third’ aspect of the prior art input/output subsys solved utilizing computers with performance capabili tems presently employed by SIMD computers is the ties far exceeding the 3 giga?op limit. handling of the corner turning function. The corner Recent advances in supercomputer performance turning function is a phenomenon due to the different have been achieved by dividing applications among arrangement of data at the host and SIMD systems. For many processors working in parallel. Theoretically, example, N 32-bit words are arranged in the host as N parallel processing computers should provide perfor consecutive words, each being 32-bits wide. However, mance in the tera?op range. While these computers 45 in transfer, these data words are distributed among 32 provide increased capacity and speed, they also provide planes of SIMD memory with each plane containing N a new set of problems, namely, programming the new bits, each of which is associated with one processor. computers, handling the input/output operations and This situation arises due to the fact that in the SIMD manipulating the data. The programming dif?culties system, all processors need to access the same memory stem from the fact that no matter how well a program is location in the same machine cycle and the plane orga written, it is extremely hard to achieve 100 percent nization supports such memory accessing. The corner utilization of multiple processors. The problem of han turning of regular data structures such as image or ma dling input/ output (I/O) operations and data manipula trix is supported by mesh-oriented row/column shift tion arises because of the sheer volume of data associ ing.
