Sum of Product Arithmetic Techniques

Europäisches Patentamt *EP001229438A2* (19) European Patent Office Office européen des brevets (11) EP 1 229 438 A2 (12) EUROPEAN PATENT APPLICATION (43) Date of publication: (51) Int Cl.7: G06F 7/544 07.08.2002 Bulletin 2002/32 (21) Application number: 01307538.7 (22) Date of filing: 05.09.2001 (84) Designated Contracting States: (72) Inventor: Tsuji, Masayuki AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU Nakahara-ku, Kawasaki-shi, Kanagawa 211 (JP) MC NL PT SE TR Designated Extension States: (74) Representative: Hitching, Peter Matthew et al AL LT LV MK RO SI Haseltine Lake & Co., Imperial House, (30) Priority: 31.01.2001 JP 2001024153 15-19 Kingsway London WC2B 6UD (GB) (71) Applicant: FUJITSU LIMITED Kawasaki-shi, Kanagawa 211-8588 (JP) (54) Sum of product arithmetic techniques (57) A SIMD (Single Instruction Stream Multiple Da- tors as one set, and by replacing a 2p-lth accumulator ta Stream) sum of product arithmetic method is dis- with an adjacent 2pth (p = 1, ···, n) accumulator, without closed. The method enables a concurrent execution of changing a sequence of accumulator addresses, in the 2n (where n is a natural number) parallel sum of product set, as accumulator addresses to be allocated to sum arithmetic operations. The SIMD sum of product arith- of product arithmetic circuits for the SIMD sum of prod- m metic is executed using 2 (m = 0, ···, log2n) accumula- uct arithmetic. EP 1 229 438 A2 Printed by Jouve, 75001 PARIS (FR) EP 1 229 438 A2 Description [0001] The present invention relates to sum of product arithmetic techniques, and more particularly, to an SIMD sum of product arithmetic method, an SIMD sum of product arithmetic circuit, and a semiconductor integrated circuit device 5 equipped with the SIMD sum of product arithmetic circuit. [0002] In recent years, attention has been paid to SIMD (Single Instruction Stream-Multiple Data Stream) arithmetic for concurrently processing (computing) a plurality of data with one instruction. In order to realize a high-functional and high-performance system for carrying out (executing) high-speed data processing and image processing, as in a color laser printer or a navigation system, for example, it is necessary to use a powerful processor having an SIMD arithmetic 10 function. [0003] The SIMD technique is a control system for concurrently processing a plurality of data with one instruction, and SIMD matrix arithmetic (processing) is an SIMD (sum of product) arithmetic technique capable of executing matrix arithmetic at high speed. In the sum of product arithmetic circuit for carrying out such SIMD matrix arithmetic, a new path (process) for copying or swapping the arguments of a matrix arithmetic is necessary, and this has lowered the 15 processing performance. Therefore, there has been a strong demand for the provision of a sum of product arithmetic circuit capable of executing an SIMD arithmetic at high speed. [0004] The prior art and the problems associated with the prior art will be described in detail, later, with reference to accompanying drawings. [0005] It is desirable to provide a sum of product arithmetic technique capable of executing SIMD arithmetic for 20 carrying out a high-speed matrix arithmetic without decreasing a maximum operation frequency, without increasing latency, and without substantially changing a circuit. [0006] According to the present invention, there is provided a sum of product arithmetic method of enabling a concurrent execution of 2n (where n is a natural number) parallel sum of product arithmetic (operations), wherein the sum m of product arithmetic is executed using 2 (m = 0, ···, log2n) accumulators as one set, and by replacing a 2p-1th 25 accumulator with an adjacent 2pth (p = 1, ···, n) accumulator, without changing sequence of accumulator addresses in the set as accumulator addresses to be allocated to sum of product arithmetic circuits for the sum of product arithmetic. [0007] Further, according to the present invention, there is provided a sum of product arithmetic method of enabling an SISD sum of product arithmetic circuit having 2n (where n is a natural number) sum of product execution units that 30 are operated concurrently, to execute sum of product arithmetic, wherein the sum of product execution units are used m for sum of product arithmetic; and the sum of product arithmetic is executed using 2 (m = 0, ···, log2n) accumulators as one set, and by replacing a 2p-1th accumulator with an adjacent 2pth (p = 1, ..., n) accumulator, without changing sequence of accumulator addresses in the set as accumulator addresses to be allocated to sum of product execution units for the sum of product arithmetic. 35 [0008] The sum of product arithmetic may be executed by replacing the accumulator addresses. The sum of product arithmetic may be an SIMD sum of product arithmetic. [0009] According to the present invention, there is also provided a sum of product arithmetic circuit having 2n (where n is a natural number) sum of product execution units that are operated concurrently, each sum of product execution unit being equipped with a multiplier, an adder and an accumulator, wherein the sum of product arithmetic circuit ex- 40 m ecutes sum of product arithmetic using 2 (m = 0, ···, log2n) accumulators as one set, and by replacing a 2p-1th accumulator with an adjacent 2pth (p = 1, ···, n) accumulator, without changing sequence of accumulator addresses in the set as accumulator addresses to be allocated to sum of product execution units for the sum of product arithmetic. [0010] Further, according to the present invention, there is also provided a semiconductor integrated circuit device having a semiconductor chip on which a sum of product arithmetic circuit is formed, the sum of product arithmetic 45 circuit comprising 2n (where n is a natural number) sum of product execution units that are operated concurrently, each sum of product execution unit being equipped with a multiplier, an adder and an accumulator, wherein the sum of m product arithmetic circuit executes sum of product arithmetic using 2 (m = 0, ···, log2n) accumulators as one set, and by replacing a 2p-1th accumulator with an adjacent 2pth (p = 1, ···, n) accumulator, without changing sequence of accumulator addresses in the set as accumulator addresses to be allocated to sum of product execution units for the 50 sum of product arithmetic. [0011] The sum of product arithmetic circuit may further comprise first selectors, each provided for each sum of product execution unit, for supplying data of each accumulator by switching data of the accumulator; and second selectors, each provided for each accumulator, for switching a processing result of each sum of product execution unit, and storing the switched processing result, and wherein the sum of product arithmetic circuit may control the first and 55 second selectors to make the sum of product execution units execute a predetermined sum of product arithmetic. Each of the first and second selectors may be switch controlled based on a control signal output from an address decoder to which a swap instruction signal is supplied. [0012] The sum of product arithmetic circuit may further comprise third selectors, for switching source data addresses 2 EP 1 229 438 A2 between addresses for an SISD sum of product arithmetic and addresses for the sum of product arithmetic, and for inputting the source data addresses to a memory from which source data is to be supplied to the 2n sum of product execution units. Each of the first, second and third selectors may be switch controlled based on a valid signal for selecting one sum of product execution unit as a valid unit. 5 [0013] The sum of product arithmetic circuit may further comprise fourth selectors, for switching resources to be supplied to an instruction decoder for generating a control signal necessary for controlling the 2n sum of product execution units, between resources for an SISD sum of product arithmetic and resources for the sum of product arithmetic, and using the selected resources. The sum of product arithmetic may be an SIMD sum of product arithmetic. [0014] The present invention will be more clearly understood from the description of the preferred embodiments as 10 set forth below with reference to the accompanying drawings, wherein: Fig. 1 is a diagram showing one example of a process of a general SISD sum of product arithmetic; Fig. 2 is a diagram showing one example of a process of a parallel SISD sum of product arithmetic; Fig. 3A and Fig. 3B are diagrams for explaining problems of a conventional sum of product arithmetic; 15 Fig. 4 is a time chart showing a matrix arithmetic according to the present invention as compared with a conventional matrix arithmetic; Fig. 5 is a block diagram showing one example of an SIMD sum of product arithmetic circuit relating to a conventional technique; Fig. 6 is a block diagram showing an SIMD sum of product arithmetic circuit according to one embodiment of the 20 present invention; Fig. 7 is a block diagram showing one example of two parallel SISD sum of product arithmetic circuits relating to a relevant technique; Fig. 8 is a block diagram showing an SIMD sum of product arithmetic circuit according to another embodiment of the present invention; 25 Fig. 9 is a diagram for explaining the operation of the SIMD sum of product arithmetic circuit shown in Fig. 8; and Fig. 10 is a block diagram showing a total structure of a processor to which the SIMD sum of product arithmetic circuit relating to the present invention is applied. [0015] Before describing embodiments of the present invention in detail, conventional sum of product arithmetic 30 circuits and their problems will be explained with reference to the drawings.

Sum of Product Arithmetic Techniques

Intel ® Atom™ Processor E6xx Series SKU for Different Segments” on Page 30 Updated Table 15

Adding Support for Vector Instructions to 8051 Architecture

Datasheet, Volume 2

MPC500 Family MPC509 User's Manual

Introduction to CUDA Programming

Address Decoding Large-Size Binary Decoder: 28-To-268435456 Binary Decoder for 256Mb Memory

Covert and Side Channels Due to Processor Architecture*

Special Address Generation Arrangement

Low Overhead Memory Subsystem Design for a Multicore Parallel DSP Processor

The Memory System

The CPU and Memory

Intel SGX Explained