2611-Wevor ; 26.1 Evor FEECK 251 23; 23 BR- : Digital Signal

III III III IIII US005465375A United States Patent (9. 11) Patent Number: 5,465,375 Thepaut et al. 45) Date of Patent: Nov. 7, 1995 54 MULTIPROCESSOR SYSTEM WITH 5,086,498 2/1992 Tanaka et al. .......................... 395/200 CASCADED MODULES COMBINING 5,165,023 11/1992 Gifford .................................... 395/325 PROCESSORS THROUGHA 5,291,611 3/1994 Davis et al. ............................ 395/800 PROGRAMMABLE LOGIC CELL ARRAY FOREIGN PATENT DOCUMENTS 75) Inventors: André Thepaut; Gerald Ouvradou, 433142 12/1990 European Pat. Off.. both of Plouzane, France OTHER PUBLICATIONS 73) Assignee: France Telecom, Paris, France S. Y. Kung; "Parallel Architectures for Artificial Neural Nets' IEEE 1988, pp. 163-174. (21) Appl. No.: 4582 S.Y. Kung et al., "Parallel Architectures for Artificial Neural 1. Nets'; IEEE International Conference on Neural Networks, 22 Filed: Jan. 14, 1993 San Diego, Calif. Jul. 24-27, 1988, 8 pages. 30 Foreign Application Priority Data Primary Examiner-Krisna Lim Jan. 14, 1992 (FR) France. 92-00312 Attorney, Agent, or Firm-Jacobson, Price, Holman & Stern (51) Int. Cl” . G06F 15/16 (57) ABSTRACT 52 U.S. C. 36,25.36." (E32. 3:53. In a multiprocessor data processing system, modules are se 3 64/DIG. cascaded by means of intermodule buses. Each module comprises a data processing unit, a first memory, a logic cell 58) Field of Search ..................................... 33529, 325, array programmable into four input/output interfaces, a 395/800; 370/53, 85.9; 364/137 second memory and a specialized processing unit such as a digital signal processor (DSP). A first interface, the first 56) References Cited memory and the data processing unit are interconnected by U.S. PATENT DOCUMENTS al module bus. A fourth interface, the second memory and the specialized processing unit are interconnected by another 4,200,930 4/1980 Rawlings et al. ... ... 395/200 module bus. A feedback bus connects the second and third 36. 12:3: Selen - - 3. interfaces in the last and first modules for constituting a ring. 4.443,850 4/1984 Fini, -- a -- 395/275 Such a system is particularly intended for image recognition, 4663,706 5/1987 Allencial." ... 355/200 such as digitalized handwritten digits for postal distribution. 4,720,780 1/1988 Dolecek .................................. 395/800 4,816,993 3/1989 Takahashi et al. ...................... 395/250 2 Claims, 7 Drawing Sheets SE,--- PROCESSOR-L-lRc TRC lap PROCESSOR:ifRC T COMPUTER COMMUNICAON COMMUNICATION COMMUNICATION nor : NETWORK For 1 201 211 20 : 20 21 - 251 ----- 25; MEMORY --- 25 221,1 22 LOGC CEL | 221; LOGIC CELL 22.LOGIC CELL 221, 22 LOGIC CEll ARRAY ARRAY ARRAY B12 B(i+1)(+2) Fitti INTEFULAR initi N-222 222; 222 i-1 2231|224 22011 felai. 22O:r 220-1* Bl(-1); 223,224."22O 2611-wevor ; 26.1 evor FEECK 251 23; 23 BR- : digital signal is 24---gis-24, - - - - - 24, - - - --------- - - ----------- - ---------- - U.S. Patent Nov. 7, 1995 Sheet 1 of 7 5,465,375 W; ; FIG. 1 (PRIOR ART) Wi-1,i-1 FIG. 3 (PRIOR ART) COMMUNICATION NETWORK MODULE BUS ME: BM; H c O PROCESSING/ s SWITCHING s 2. } 2 n INTERMODULAR in r BUS --is J n BIM-1 U.S. Patent Nov. 7, 1995 Sheet 2 of 7 5,465,375 FIG. 2 (PRIOR ART) - - - - - - - - - - - - - - NEURONS HIDDEN LAYERS 2 TO (N-1) U.S. Patent Nov. 7, 1995 Sheet 4 of 7 5,465,375 U.S. Patent Nov. 7, 1995 Sheet 5 of 7 5,465,375 OL=(I)XO|AZZZZZZZZZZZZ7(|x01)XºJONALEN U.S. Patent Nov. 7, 1995 Sheet 6 of 7 5,465,375 U=256 U-40 U.S. Patent Nov. 7, 1995 Sheet 7 of 7 5,465,375 FIG. 8 ALGORITHM 22 (> 1) ALGORITHM 24 (>1) RECEIVE V1 TO V-1 TRANSMIT V1 TO Vi TO 22-1 COMPUTE V =X Wije WRITE V IN 22 ALERT 20 ALGORITHM 221 RECEIVE V1 TO V TRANSMIT V1 TOW TO 231 ALERT 241 ALGORITHM 20 (>1) WARN 221 ALGORITHM 241 DIRECT MEMORY ACCESS TO 23i COMPUTE f:sig(V) Si-1<sig(V)<1 ALERT 201 ALGORITHM 201 ACKNOWLEDGE WAIT FOR WARNINGS FROM ALL THE 24 AUTHORIZE DIRECT MEMORY ACCESS TO 20 F RECOGNITION NOT ENDED THEN ALGORTHM 1 STEP (1) ELSE MEMORIZE RESULT READ MEMORY 231 TRANSMIT RESULT TO 1 WAIT FOR NEXT VECTOR O PROCESS 5,465,375 1 2 MULTIPROCESSOR SYSTEM WITH for each stage of the above-mentioned architecture. CASCADED MODULES COMBINING PROCESSORS THROUGH A PROGRAMMABLE LOGC CELL ARRAY SUMMARY OF THE INVENTION BACKGROUND OF THE INVENTION 5 Accordingly, there is provided a multiprocessor data 1. Field of the Invention processing system embodying the invention including a This invention relates to multiprocessor data processing plurality of cascaded modules. systems in general. Each of the cascaded modules comprises 2. Description of the Prior Art 10 a data processing unit connected to other data processing The increasingly greater computational throughput units in immediately adjacent downstream and requirements in data processing systems for applications upstream modules by way of a communication net such as image processing or scientific computation, have led work. Each of the cascaded modules further comprises; computer designers to introduce new processor architec a first memory, tures: parallel architectures. Three basic principles are used 5 an additional processing unit, for introducing this parallelism in the new achitectures. The distinction is made between: a second memory, segmented (or pipeline) architectures: this consists in a logic programmable cell array. The programmable logic breaking a task down into plural steps and in perform cell array is configurable into first, second, third and ing these steps independently by different processors. 20 fourth input/output interfaces for temporarily memo Every time an intermediary result is obtained after rizing data into memorized data, and into a central performance of a step, it is transmitted to the next processing and switching circuit for processing the processor and so on. When a step is completed, the memorized data into processed data and switching the processor in charge of performing it is freed and thus processed data towards one of the input/output inter becomes available to process new data. Presupposing 25 faces. Each cascaded module further comprises; the respective durations of performance of the different a first module bus for interconnecting the data processing steps to be substantially equal, the period required to unit, the first memory and the first input/output inter obtain the final results is then the duration of perfor face, and mance of one step, and not the duration of performance a second module bus for interconnecting the additional of the task; 30 processing unit, the second memory and the fourth array processor architectures or SIMD (Single Instruction, input/output interface. Multiple Data Stream) architectures. In this type of The second and third input/output interfaces in each of the architecture, the increase in computational throughput modules are interconnected to the third input/output inter is obtained by having the same instruction performed face in the immediately adjacent downstream module and by a large number of identical processing units. This 35 the second interface in the immediately adjacent upstream type of architecture is particularly well suited to vec module by two intermodular buses, respectively. According to another embodiment, given that, on the one torial processing; and hand, the processing and switching means is configurated multiprocessor architectures or MIMD (Multiple Instruc for once and for all for a given application and, on the other tion, Multiple Data Stream) architectures. In such an 40 hand, that several successive multiprocessor processings can architecture, several processors perform respective be carried out by the processing units on a same data stream, streams of instructions independently of one another. the data already processed according to a first processing Communication between the processors is ensured must be redistributed to the different modules for a next either by a common memory and/or by a network processing. In this case, the second and third input/output interconnecting the processors. 45 interfaces respectively in the programmable logic cell arrays Pending European Patent Application No. 433,142 filed of the last and first modules of the plurality of cascaded Dec. 6, 1990 discloses an architecture of a multiprocessor modules are connected by way of a feedback bus. data processing system in which the bus is shared between The invention also relates to a data processing method plural processor stages and is interfaced in each stage by a implemented in a multiprocessor data processing system programmable LCA Logic Cell Array configurated into 50 plural input/output means and a switching means. The main embodying the invention. The method comprises: advantage of such an architecture is to dispense each pro an first step further consisting in loading a respective set cessor from bus request and management tasks, the latter of weights into the second memory of each of the being carried out in the logic cells array associated with the cascaded modules via the communication network, and processor. Nonetheless, this architecture is not optimal for the input data into the first memory of the first module, the multiprocessor approach to scientific computation appli 55 and cations. Each processor is in fact entrusted with all the tasks at least one set of second and third steps, to be performed (excepting management of the bus). Numer the second step consisting in carrying out partial process ous multiprocessor applications require considerable com ings on the input data in the additional processing unit putational means and a single unspecialized processor per 60 of each cascaded module as a function of the respective stage restricts performances. set of matrix multiplication weights in order to deter OBJECTS OF THE INVENTION mine partial data, and the third step consisting in downloading the partial data to The main object of this invention is to remedy the any one of the programmable logic cell arrays or any preceding disadvantages.

2611-Wevor ; 26.1 Evor FEECK 251 23; 23 BR- : Digital Signal

And Complex-Valued Multiply-Accumulate SIMD Unit for Digital Signal Processors

A Many-Core Architecture for In-Memory Data Processing

NVIDIA Bluefield-2 Datasheet

Bluefield As Platform

Smartnics: Current Trends in Research and Industry

Opportunities for Near Data Computing in Mapreduce Workloads

Dpus: Acceleration Through Disaggregation

Big Data Meets HPC Log Analytics: Scalable Approach to Understanding Systems at Extreme Scale

Hardware Acceleration of Biophotonic Simulations by Tanner Young

A Many-Core Architecture for In-Memory Data Processing

Unit – V– Sbs1203 – Computer Architecture

In Storage Process, the Next Generation of Storage System