ISSN 2321 3361 © 2021 IJESC

Research Article Volume 11 Issue No.05

An Analysis of Recent Applications of Multiple Instruction Multiple Data (MIMD) Dosti Kh. Abbas Faculty of Engineering, Soran University, Kurdistan, Iraq Soran, Erbil, Kurdistan Regional, Iraq

Abstract: MIMD means Multiple Instruction Multiple Data, which is an Architecture to acquire parallelism. It has been an interesting subject for many researchers in the past and recent years. To the computer performance, multiple instructions processes on multiple data. MIMD architecture works with programming design and programming design. Each of the models has its benefits and drawbacks. MIMD has been an interesting subject for many researchers in the past and recent years. In the literature, many studies have been done and different applications implemented on both of the architectures of MIMD computer devices. Nowadays, the architecture of parallel is improved from most angles. But, it still requires researches and testing applications on different kinds of its architecture. Our objective in this article review is to collect and review some works that have been done on MIMD architecture to evaluate and analyze the outcomes of the reviewed researches.

Keywords: MIMD, Computer Parallel architecture, Shared Memory, Distributed Memory.

I. INTRODUCTION was a group-based multiprocessor with an appropriated memory and a non-uniform access time. The nonappearance The requirement for increment in computer frameworks of stores and a long distant access inertness made information performance cannot be over-estimated, as it improves the arrangement basic. A large number of the thoughts in these processing of instructions. Parallel processing [1] is an multiprocessors would be reused during the 1980s when the approach in which instruction and data are operated at the microchip made it a lot less expensive to fabricate same time through computer devices. This technique works by multiprocessors. With the essential special case of the parallel partitioning huge issues into more modest parts which are then vector multiprocessors and all the more as of late of the IBM settled all the while. In a perfect world, it makes instruction Blue Gene model, any remaining ongoing MIMD computers execution quicker since there are numerous processors have been worked from off-the-rack microchips utilizing on the program, yet it's frequently hard to partition an logically central memory and a or a distributed memory instruction such that different CPUs can execute various and interconnection network. MIMD computer architecture segments without meddling with each other. Subsequently, it has been an interesting subject for many researchers in the has become a dominant paradigm in architecture that past and recent years. In the literature, many studies have been machines have multi-core processors. In the case of multi-core done and different applications implemented on both of the processors, the execution of instruction will happen at the architectures of MIMD computer devices. Nowadays, the same time by any .The processors hold the execution architecture of parallel computers is improved from most of instruction separately depending on their structure. In a angles. But, it still requires researches and testing applications taxonomy, Michael J. Flynn categorized architectures of a on different kinds of its architecture. Our objective in this parallel computer into four kinds depending on the number of article review is to collect and review some works that have data and instructions that they can hold at a time [2]. Our been done on MIMDs to evaluate and analyze the outcomes of focus here is on MIMD parallel architecture. MIMD is a the reviewed researches. The following of this study is multiprocessor design in which different sets of instruction are structured as follows. In section two, MIMD architecture and in operation simultaneous sly and each cycle brings instruction its types are presented. Some recent works that had been done at the same time independently and perform the procedure on on MIMD computer machines will be reviewed in section the instructions simultaneously on numerous CPUs. It is hard three. Section four contains the discussion of our study. to differentiate the primary MIMD multiprocessor. Finally, we conclude our study in section 5. Shockingly, the primary computer from the Eckert-Mauchly Corporation, for instance, had copy units to improve II. MIMD ARCHITECTURE accessibility. Two of the best-recorded multi processor projects were embraced during the 1970s at University of In the field of computing, the MIMD is a procedure utilized to Carnegie Mellon. The first of these was C.mmp, which accomplish parallelism. Those devices, which are utilizing comprised 16 PDP-11s associated with a crossbar to 16 MIMD, have some processors that function independently and memory units. It was among the main multiprocessors with in asynchronously. Different instructions on a different part of excess of a couple of processors, and it had a shared memory data may be executed by different processors at any time. programming model. A large part of the focal point of the MIMD structures might be utilized in various application exploration in the C.mmp project was on programming, fields, for example, computer-aided, modeling, simulation, particularly in the OS region. A later multiprocessor, Cm*, and communication . MIMD computers can be of two

IJESC, May 2021 28011 http:// ijesc.org/ categories such as distributed memory and shared memory.  Through sending messages, Each PE can communicate The categorization is depending on accessing memory by with others. MIMD processors. Distributed memory computers may have By providing all the processors their memory, this type of mesh interconnection or hypercube schemes. In contrast, the architecture bypasses the downsides of the architecture of the shared memory computers may be of the types of extended, shared memory. A processor may just access the memory that bus-based, or hierarchical [3]. The architecture of MIMD is straightforwardly associated with it. contains a set of tightly coupled, N-individual processors. A memory that is common to the entire of processors is included and it cannot be accessed directly by the other processors.

2.1 SHARED MEMORY MIMD structure works with two kinds of memory, shared memory is one of them. All Processing Elements (PE) share a typical memory address. Each handling component can get to any module through an interconnected network straightforwardly. They impart by composing into normal location space. PE's are independent yet they have a typical memory [4]. The structure of a shared memory MIMD is illustrated in fig. 1. Figure.2. Distributed Memory [4]

III. MIMD ARCHITECTURE APPLICATIONS

The parallel instructions coding is more convoluted than consecutively writing instruction. By using this technique, different models or different architectures can be beneficial for various applications, specifically MIMD. Now, it is based on the application’s features indicating which structure is more functional to make the application and selecting preferable parallel hardware to be utilized for the application. In this section, some research works that used MIMD computer architecture for some scientific applications are reviewed in different areas of computing. Neural network is one of the most used and most proposed algorithms for learning and Figure.1. Shared Memory [4] training different software applications in different areas. In our

article review, we reviewed two different articles that the Feature of Shared Memory MIMD architecture is MIMD was used in the researches for evaluating the neural analyzed as follow: networks and the architectures. In one of the research works • Creates a set of processors and memory modules. [5], Abbas proposed to use an MIMD architecture. The way • Any memory module can be directly accessed by any toward preparing neural networks on parallel structures was processor through an interconnection network. utilized to survey the exhibition of such countless parallel • A universal address space is outlined by the group of machines. The authors explored the usage of backpropagation memory modules that are shared between the processors. (BP) on the Alex AVX-2 coarse-grained MIMD machine. A A vital advantage of this design type is that it is extremely host–worker parallel usage was completed to prepare various simple to program since there exists no unequivocal models to train the NetTalk dictionary. At the first, they connections among processors with interchanges tended to developed a computational design utilizing a single processor through the worldwide memory store. to finish the learning cycle. Additionally, they created a

connection model for the host–worker topology to compute the 2.1 DISTRIBUTED MEMORY connection. The two models were then used to forecast the The other type of MIMD architecture is distributed memory. performance of the machine when processors were utilized and In this kind of architecture, all Processing components have a comparison was carried out with the genuine estimated their own location space. They interact with one another performance of the implementation of the parallel architecture. through message passing. Each handling component is Their study results show that the two models can be utilized blocking or waiting for a message. A software engineer is viably to anticipate the machine execution for the NetTalk liable for disseminating information to a singular preparing issue. This includes disintegrating the preparation information component. It is based on two activities send and get. It into blocks and distributing them equitably over the worker interfaces processor and memory using an interconnected processors. A copy of the back-propagation network is totally network. Each processing component comprises of a required in order to data slicing and placed with the same combination of memory and processor. They can interact with architecture and parameters. Each model will be working on its each other using messages [4]. The structure of this a dataset. The significant objectives of the author’s work were to distributed memory MIMD is illustrated in fig. 2. benchmark the Alex machine against the NetTalk dictionary,

comparing the forecasted performance with the genuine one, Features of Distributed Memory MIMD architecture are and comparing the machine with different platforms. The analyzed as follow: parallel back-propagation model utilized to train the NetTalk  Clones the pairs of memory/processors called processing dictionary was mapped onto the MIMD device called the Alex element (PE) and uses an interconnection network to link AVX-2. This device has 64 compute neurons and 8 root them.

IJESC, May 2021 28012 http:// ijesc.org/ neurons. For computation purposes, each neuro runs an Intel provide synthetic data from limited datasets. Generative i860 and for both communication and Adversarial Networks are a further extension of deep learning computation, they employ an INMOS transputer (T805). Their into some kind of fields such as robotics, medicine, and content architecture is dependent on a distributed memory while a synthesis. Researchers in [7] Proposed an architecture named common memory is shared by two processors in each neuron. the GANAX by using conventional convolution accelerators Another research study that used the techniques of neural for alleviation of the sources of inefficiency associated with the networks was the research work [6], which presented three acceleration of GANs, making it possible to design the first simple applications of MIMD computers to improve the neural Generative Adversarial Network accelerator. network (NN) model performance. Speeding up the training of MLPs was the target of the first two techniques named pattern partition and competing networks by the contest of NNs with different weights among processing elements. On the other hand, the third technique parallel search approach performed a rather “brute-force” search of optimal weights for the energy function of Hopfield NNs executed to optimization problems. A mesh parallel computer and a binary tree were used to demonstrate the power of the presented methods. The researchers used two MIMD parallel computers such as CoralbgK and Fujitsu AP1000. Fig. 3 illustrates The CoralbgK, which is a binary-tree MIMD. In utter terms, it is a slow computer, however, it is still functional as a computer model and for assessing the speed-up gained through parallel algorithms. In this type of CPU, 63 PESs are available, which are associated in a binary tree as shown in the figure. The PEs Figure.4.APlOOO torus MIMD computer [6] have 512KB local memory for each, while the processors do not make memory sharing. The processor in the top is The research study proposed a unified design MIMD-SIMD for associated with the host directly, the broadcast capability is not GANAX. For creating distinct microprograms, it leverages seen, and essentially the PEs send messages in three directions, repeated patterns in the computation. The interleaving of called, top, right, and left. This type of design makes the MIMD and SIMD procedures is executed at the granularity of a MIMD computer easy to expand and allows a simple message single micro programmed operation. To create an embedded sending plan, but additionally, it generates a serious computer vision called PRECISION a hybrid SIMD/MIMD communications congestion. The distance between processors parallel structure was proposed in [8]. It is a will increase quickly since more PESs are utilized in an structured to give flexible and fast computation of requirements application, and it makes trouble for the communications. of image processing missions of vision applications. They prototyped a 32-bit 128-unit device. The System-on-Chip’s main modules include the presented coprocessor, PRECISION. To manage irregular program flows and data the embedded CPU is intended. The architecture of the coprocessor consisted of three major parts such as a Programmable Output Processor (POP), a Programmable Input Processor (PIP), and an array of Processing Elements (PEs). In the mode SIMD, the same instruction will be executed by all PEs over their dataset, which is sited in local memory. The side-to-side network is used to exchange data that associates uniquely adjacent PEs in a one- dimensional array. In mode MIMD, the PEs store dataset and its program in local memory. In any case, they can trade data utilizing a local network, a 2-dimensional auto-synchronized and circulated network. Due to the computational power and flexibility, Multi-layer MIMD computers became a variable solution to computer vision. Nevertheless, parallel programs’ performance that executed on these frameworks is based on the

Figure.3. Coral68K binary-tree MIMD computer [6] efficiency of communication of inter-processor. Hence, based on these points, a software system was proposed by the Fujitsu AP1000, which is a very powerful computer at that researchers in [9] to hold complex operations communication. time. It is the other MIMD computer that is used in [6], as it is They established a communication design, routing, and illustrated in fig. 4. In this kind of MIMD, each PE (SPARC IU switching technique. Their system was organized into two +FPU) has 16MB of local memory and a workstation different units such as processing unit (PU) and the processing capability. Communications of APlOOO are highly (CU). PU is a multi-DSP board, which was dedicated to fast but similarly to Coral68K, there is no shared memory. It executing vision algorithms. The CU is an ordinary embedded has two routing networks, one for random communication board depending on the Intel core. The CU is interacting between PESs and the other one only for broadcast. As shown with the user and at processing layers it controls operations. in the figure, the latter was applied as a bi-dimensional torus. They used the standard CPCI 6U board for all the circuits of Additionally, using the technique of worm-hole routing the system for implementation. CU is indicated as the major practically, makes the communication inertness free of the real controller of the CPCI bus and starts the communication on the distance between processors. One of the recent deep learning bus. As shown in fig. 6, the topology between DSP and the x86 designs is Generative Adversarial Networks (GANs) that is a single-level tree network.

IJESC, May 2021 28013 http:// ijesc.org/ To amortize the expense of execution of MIMD, the researchers in [7] tried to decouple data access from data handling in GANAX. This decoupling prompts another plan that breaks each engine processing to an execute micro-engine and an access micro-engine. The proposed design expands the idea of access-execute models to the best computation granularity for every individual operand. To moderate the asset underutilization, they devise a bound-together SIMD- MIMD structure that receives the benefits of SIMD and MIMD execution models simultaneously. In [8], Depending Figure.6. The tree topology of x86 CPU and multiple DSPs on our analysis, functional units were been shared by the connected by CPCI bus [9] combined SIMD/MIMD parallel architecture for reducing hardware consumption. The architecture includes a pliable Eight TS201 chips are located on a multi-DSP board. To network to alter the data path to the algorithm for execution. communicate with other DSPs, TS201s use four-link ports that We noticed that this solution gives enough adaptability for can be connected up to three more DSPs. As it is shown in taking advantage of various instruction and data-parallel Fig7. The DSPS can compose a 3-dimensional hypercube processing, making it appropriate as a coprocessor of high- network. throughput vision frameworks. Based on our research on the study [9], the algorithms were efficient and they do not bring down the performance of the physical link. The framework upholds different connection designs, which make it more proficient for PC vision applications. All the outcomes are tried on functional DSPs, which give a more genuine incentive for the connection in a hypercube network. These calculations can be stretched out to other comparable topologies. Nevertheless, the connection in the framework is more muddled. FPGA in the framework assumes a focal job for connection. Its performance is not discussed. A dedicated board does realize the system data input and output while developing. All of the mentioned points can be studied and

Figure.7. The hypercube topology of multiple DSPs will improve the model. connected by link ports [9] V. CONCLUSION

IV. RESULTS AND DISCUSSION A review is done on some recent works that have been studied All The reviewed research works in the previews section have on the MIMD computer architectures. Different applications their useful impact on different areas that they have been done on different architectures were implemented in the literature. on it. Also, they are not without disadvantages. Here, the The reviewed research works have been analyzed and reviewed studies will be analyzed and evaluated. In our evaluated based on their implementation and experiments. In opinion, neural network techniques have been successfully conclusion, we have noticed that all of the researches have implemented in both of the first works [5] and [6]. In the first their useful impact on different areas that they have been done study, the machine performance on NetTalk can be predicted on it. In contrast, they are not without disadvantages. In this within a very small margin of error. Therefore, the machine article review, we addressed that the Distributed memory is performance for training data, any set of workers, or network much more useful than the shared memory. In shared memory, size can be easily found. The same method can be used to there is not a possibility for the processes data communication predict the machine performance for other applications. An so a monitor is not required. Communication is done with the assessment of the machine in terms of MCUPS measure, technique of message passing. While synchronization is , and efficiency was carried out. An efficient machine provided by Message-passing. The Shared memory has will train a very large data set while using as many processors performance overheads since, in the case of sharing a page as possible. Also, we noticed that performance of the between two processing elements, it will ping-pong between machines on NetTalk can be anticipated within a small margin these two processing elements so that it strictly takes down the of error. Thusly, the performance of machines can be easily performance. Another advantage of distributed memory over found for training data, any set of workers, or network size. In shared memory is that memory architecture is very simple in the second study, the two approaches workstation processing distributed as compared to shared memory. Shared memory is capability are independent of the learning rule. For further a multiprocessor while Distributed memory is multicomputer. improvement in training speed, these approaches can be For future work, the addressed points can be improved in all combined with the standard backpropagation. Likewise, a the reviewed articles and implement in different architectures hybrid approach is by all accounts a characteristic expansion. of the computer. In the literature, A few other techniques have been presented for example pipelining of the tasks involved in training VI. REFERENCES multilayer-perceptron and dividing of weights of a single multilayer-perceptron among several PES [10]. This kind of [1]. Introduction to Parallel Processing, Cornell Theory Center technique provides only a modest speed-up, but when Workshop, 1997. combined with techniques requiring higher parallelism, they may be effective since they use only a few PES. [2]. M.J, Flynn, Very high-speed computing systems. Proceed ings of the IEEE, 1966, 54(12), pp.1901-1909.

IJESC, May 2021 28014 http:// ijesc.org/ [3].S.K. Meesala, P.M. Khilar, and A.K. Shrivastava, Multiple Instruction Multiple Data (MIMD) Implementation on Clusters of Terminals.

[4]. M.W. Azeem, A. Tariq, and A.U. Mirza, A Review on Multiple Instruction Multiple Data (MIMD) Architecture. In International Multidisciplinary Conference, 2015.

[5]. H.M. Abbas, Performance of the Alex AVX-2 MIMD architecture in learning the NetTalk database. IEEE transactions on neural networks, 2004, 15(2), pp.505-514.

[6]. J. Tanomaru, S. Omichi, and A. Azuma, October. General purpose MIMD computers and neural networks: three case studies. In 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century, 1995, (Vol. 5, pp. 4587-4592). IEEE.

[7]. A. Yazdanbakhsh, K. Samadi, N.S. Kim, and H. E smaeilzadeh, June. Ganax: A unified mimd- acceleration for generative adversarial networks. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), 2018, (pp. 650-661). IEEE.

[8].A. Nieto, D.L. Vilariño, and V.M. Brea, PRECISION: A reconfigurable SIMD/MIMD coprocessor for Computer Vision Systems-on-Chip. IEEE Transactions on Computers, 2015, 65(8), pp.2548-2561.

[9].H. Liu, P. Jia, L. Li, and Y. Yang, October. Commun ication in a hybrid multi-layer MIMD system for computer vision. In 2011 4th International Congress on Image and Signal Processing, 2011, (Vol. 4, pp. 1855-1859). IEEE.

[10].J. Torresen, S.L. Mori, H. Nakashima, S. Tomita, and O. Land sverk, Parallel back propagation training algorithm for MIMD computer with 2D-torus network. In Proceedings of 3rd Workshop, 1994, October, (PCW’94).

IJESC, May 2021 28015 http:// ijesc.org/