<<

Journal of Xi'an University of Architecture & ISSN No : 1006-7930

Architecture and Advantages of SIMD in Applications

Sarah M. Al-sudany, Department of Engineering University of Technology, Baghdad-Iraq E-mail: [email protected]

Ahmed S. Al-Araji Department of University of Technology, Baghdad-Iraq E-mail:[email protected]

Bassam M. Saeed Department of Computer Engineering University of Technology, Baghdad-Iraq E-mail: [email protected]

Abstract— In this paper, we identified the single instruction multi- architecture (SIMD) that is a method of parallelism. Most modern designs contain SIMD in order to increase performance of the computer. The aims of this work are to describe the classification of SIMD architecture in computer systems that it depends on implementation when dealing with performance time and to utilize the efficiency SIMD in multicore and multi-processing for able computer system when implementing a program for better performance. This can be achieved by studying the basic principle of SIMD architecture and emphasized two types of SIMD array Processors: array processor and then identified advantages and disadvantages of these types as well as focusing on the types of SIMD architecture, true and pipelined SIMD. This paper provides an overview of the characteristic multimedia extensions to SIMD then analyze the development in the use of multimedia extensions in the applications that need speeding up processing such as digital processing (DSP), image processing and mobile application.

Keywords— Array Processing Architecture, SIMD Architecture, Vector Processing Architectur. I. INTRODUCTION The architecture of the was usually motivated by higher demands. Several design methods were used to remove numerous forms of applications parallel [1]. Computer engineering has achieved technological developments in recent years which made a great mutual impact, especially in the dissemination of single- instruction multi data (SIMD) for individual education [2]. For example, in the early , the first use of the SIMD instructions was used with CDC Star 100 and TI ASC which were able to perform the same functionality on a batch of data . A new era has begun in the use of SIMD to the data in parallel with thought machines CM-1 and CM-2 considered as highly parallel processing . Many desktop are doing many tasks nowadays such as video processing and gaming in real time, with a bundle of digital information. Companies were therefore trying to use this architecture on desktop computers [3]. Sun Microsystems introduced SIMD integer instructions in the 1995 UltraSPARC I microprocessor in a VIS (visual instruction set) extension. MIPS (MIPS Digital Media extension) introduced the MDMX. By adding MMX extensions to the architecture in 1996, made SIMD widely available. Motorola then implemented AltiVec's PowerPC system, which was also used in POWER systems by IBM [4]. This caused the Intel's to react which was SSE. SSE and its extensions are more used than the others. The purpose of introducing SIMD extensions is that it especially applies to common tasks such as adjusting contrast in a digital image, adjusting volume, and other audio tools [5]. Most modern (CPU) designs including SIMD instructions to improve multimedia usage performance. SIMD capabilities of processors must be taken into account during standardization. For example, a reasonable intermediate calculation accuracy is determined and

Volume XII, Issue VI, 2020 Page No: 1452 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930

the sample dependencies are done through a clear effort. Parallel computer systems, on different pieces of data execute the same method [6]. The SIMD Accelerator thus only needs interchangeable working modules with a very basic control mechanism. The improved performance and low complexity of SIMD accelerators have contributed to SIMD operation, as its name suggests, through operating with a single direction on several data elements in parallel [7]. Therefore, the first move in having SIMD to be introduced is to encrypt several data elements that can be operated in parallel with vector-compile-time vector instructions. The execute review in order to identify various three instructions executing the same operation [8]. Many of these instructions are combined into a common instruction named the vector / SIMD instructions, based on the form of instruction and the appearance of the SIMD accelerator [9] Vectorized instructions, and then implemented on the SIMD accelerator. Some of these implementations are described below:  The cumulative number of instructions on executing a programme reduces due to single vector instructions for coding different operations. These effects are decreased guidance on specifications for capacity. Instead, it improves the access rate to the help instruction cache and improves performance [10].  If there are fewer instructions, it can reduce the amount of that the front-end processor needs to perform. It requires receiving guidance for decoding and its lower timetable. The back-end will therefore delete less command. This ensures improved energy quality [11].  Less instruction decreases the amount of work the processor has to do at the front end. The instructions need to be decoded and organised for less instruction. However, there is less back-end instruction. This mechanism leads to the improvement of energy efficiency.  Different operations embedded in an order allow for a broad variety of successful operations. It leads to efficient manual . Therefore, design of the SIMD architecture gives computational design several improvement benefits such as the execuition time performance, scalability of data size, cost saving and provides concurrently. The huge performance benefits can only be seen in the world of graphical applications with a SIMD technique, and at the same time, many problems can be benefited from the SIMD technique. Even when working with consumer applications, being cautious about using memory and CPU can bring enormous benefits to the user experience as you learn more about software engineering. This paper is organized according to the following: Section II, describes the basic principle of SIMD architecture; Section III introduces the implementing of SIMD architecture; Section IV explains the multimedia applications; Conclusions are cited in section V. II. BASIC PRINCIPLES OF SIMD ARCHITECTURE When researching SIMD architecture, it is necessary to know the rules work of . Because of the core work of SIMD component depends on the characteristic of parallelism, it is important to know the SIMD processing, this research will investigate Attached array processors and SIMD array processors, especially knowing the term of vector processing and array processing, and focus on the difference between these architectures in regard to each distinct application. It will search and review each of these topics so that it becomes easy then to enter into the classification of applications that use SIMD architecture. 2.1 Data Parallelism Data parallelism aimed to improve processing speed dependent on data set storage capacity in overlapping computing flows. For examples, the method of consolidation of customer address gathers an address and tries to turn it into a regular type. It function is adaptable to data parallelism and can be optimised by adding eight 32 standardisation processes for the addresses by a factor of 8 and streaming a part of information for each case as in Fig. 1.

Figure 1. The data parallelism in SIMD.

Volume XII, Issue VI, 2020 Page No: 1453 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930

This method is a more precise and stronger parallel by continuously applying the same limited range of tasks to several data sources, to increase our output. However, big gains can be made on today's processors if it is possible to know when and how to implement SIMD. As with all performance improvements, we should measure gains on a typical target device before putting it into production. Although some types of architecture use SIMD with minimal code changes, other custom algorithms bring additional complexity, so we should also consider this trade off against expected performance gains [12]. 2.2 SIMD Processing Array processors were called multiprocessors were vector processors for some time; this performs on a broad variety of data and can boost computer efficiency. There are two types of array processors:  Attached array processors.  SIMD array processors. An attached array processor is a processor that is attached to a general-purpose computer and its function is to boost and enhance the computer's performance in numerical computational operations. It achieves high performance by handling multiple functional units in parallel, therefore, this review interested in the second type of processors. Which is a single computer system that has multiple processors running in parallel. Processing units built to operate under the control of a common , have the duty to provide a single instruction stream in addition to multiple data flows. The diagram below shows a general block of an array processor as in Fig. 2.

Figure 2. General block of an array processor.

It comprises a collection of symmetric (PE's) processing components, each with a local M memory. Every factor has an ALU processor and records in it. The principal monitor (control unit) monitors all processor part operations [13]. The master control unit decodes the instructions and determines how the instructions will be executed. the program can be stored in the main memory. The instruction flow is the responsibility of control unit. There are two different types of SIMD ARRAY PROCESSOR architecture, and each type has its own characteristics in transmitting instructions that work in different manners. we will shed the light on some of these characterizes and highlight the main difference of the SIMD array types: 2.2.1 Array Processing Architecture The diagram of an array processor architecture is a single instruction multiple data can be shown in Fig. 3. The array processing architecture this is called a "two-dimensional" array or "matrix". The matrix implemented by the two dimensional processor is the basic form of diagram. The work mechanism in the array processor is that a CPU in the processor issues a single instruction and then it applied to a number of data simultaneously. This structure is based on the fact that all data sets operate on the same instructions, but if these data sets are dependent on each other, it could be not possible to apply parallel processing. So the array processors contributes effectively and increase the processing speed compare to the total instructions. The advantage of array processor to relieve the vector processor bottleneck problem.

Figure 3. An array processor architecture.

Volume XII, Issue VI, 2020 Page No: 1454 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930

They proposed a Breeze instruction multi-dimensional vector to speed up increase the operations nested loop, which are often found in video applications [14]. However, the proposed implementation SIMD SOC model designed for the FPGA using replication of IPs and FPGA programmability achieve high performance and at low cost [15]. From previous studies disadvantage of SIMD array processor instruction structure is very complicated. 2.2.2 Vector Processing Architecture SIMD Vector processor is a processor that can apply a single instruction on group (vector). Parameter to complete vector instructions instead of one element. The vector processor is the processor that instructs and operates on vectors, not on the single data value. Depending on how parameters retrieved, vector processors can be classified into two types, memory- memory structure and register - register structure. In memory- memory structure coefficients, this architecture memory parameters operate streamed directly to the functional units of memory and the results are rewritten to memory while continuing with the vector process [16]. In register – register structure, operation and the results are rewritten indirectly from the memory through the use of a large number of vector registers [17]. VENICE is SIMD Vector processors can reduce bandwidth for fetching and decrypting data, as less instructions fetched. It also exploits data parallels in large multimedia applications [18]. VEGAS is soft vector processor, in which the VEGAS processor read and write directly to the memory instead of bank register, the memory is more efficient storage than bank register vector [ 19]. From previous studies, generally, the drawback of the vector processing is its ineffectiveness when dealing can easily become a memory with irregular parallelism and the bottleneck especially if data is not appropriately mapped to memory banks. To solve this problem use sparse matrix-vector multiplication (smvmp) it is hybrid performance method to optimization bottleneck sparse matrix-vector multiplication processor on SIMD [20]. Also, solved this problem in programmability. Advantage of vector processor is configuration, portable to any architecture FPGA, scalable to higher-performance designs and flexible [21]. The diagram of array vs. vector processor as shown in Fig. 4.

Figure 4. The diagram of array vs. vector processor

III. IMPLIMENTING OF SIMD ARCHITCTURE In general, there are two types of SIMD architectures: the first is True SIMD architecture and the second is Pipelined SIMD architecture. So SIMD architecture works with two types of memory, and . The true SIMD architecture used in shared memory and distributed memory; it has a single control unit. The distributed memory is responsible for cooperating with every other processing element. This is support the fact that every processor has its own memory in distributed memory architecture. The process started with instruction provided by control unit to the processor, then simultaneously every processor act as an arithmetic unit [22]. The control unit is the component of the between each processing element with other processor in the same SIMD architecture. The control unit assists specific processing element to acquire information from the other processor element in the same architecture [23]. The control unit is responsible for handling this type of information transfer that occurs between processing elements of the same architecture. In this architecture, the two processors can communicate with each other memory in an interconnected network [24]. The diagram of a True SIMD architecture as shown in Fig. 5. Pipelined SIMD architecture operates, that controlling unit send instruction to each processing element to perform using shared memory at different stages. The controlling unit is responsible for sending instruction in different streams to processing elements. The takes different processing capability, operate on streams of instructions, and performs all the operations of an arithmetic unit [25]. The has taken different arithmetic logic units and linked to memory. The data must be evaluated and stored in different memory units, so that the pipeline can be fed with this information as quickly as possible. The diagram of An Pipelined SIMD architecture as shown in Fig. 6.

Volume XII, Issue VI, 2020 Page No: 1455 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930

Figure 5. True SIMD architcture.

Figure 6. Pipeline SIMD architecture [26].

The main drawback in pipeline is the performance time where the controlling unit has to handle the data transfer. The main disadvantage in shared architecture is that if there is a want to expand this memory architecture, every module (memory and processing elements) has to add separately and configured. In addition, this memory architecture is still advantage since it improves performance time and the information can more freely transferred without the controller [27]. The main drawback for the distributed memory architecture is the performance time during which the controller has to handle the data transfer [28]. IV. ADVANTAGES OF SIMD ARCHITCTURE FOR THE MULTIMEDIA APPLICATIONS The importance of SIMD stems from being a source of many applications, whether in opening in the use of technology, or in re-evaluating the traditional use of the processors. In this section, we will highlight the number of applications that have been used as properties SIMD in their implementation. The Advantages of SIMD for these applications stems from the need of the scientific effort to obtain a speed of implementation to, less energy consumption, and cheaper price, so we will shed the light on three uses of SIMD, and then we will see how these uses have facilitated processors and make it work efficiently [29]. 4.1 Processing Digital Signal Processing (DSP) is a new human contribution in technology; it deals with the application of SIMD such as sound, video, temperature, pressure, position that mathematically processed. The DSP is a designed to use a mathematical function such as "added", "subtraction", "multiplication" and "divide" very quickly and simultaneously. The complexity of implementing the system is increasing. In addition, changes in standards should take into consideration during the development phase. Efficiency is one of the most important criteria. Devices should carry high performance and small battery consumption [30]. The diagram of the difference between conventional design and digital signal processing system can be shown in Fig. 7. The customizing features for DSP includes support of SIMD to increase the SIMD utilization efficiency in terms of high performance, small battery consumption, operation aids to speed up processing, and a quick programmable crossbar to supporting complex data operations as well as area and energy improvement [31, 32].

Volume XII, Issue VI, 2020 Page No: 1456 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930

Figure 7. Difference between conventional design and digital signal processing system.

4.2 Image Processing The speed of image processing can be greatly increased using SIMD instructions. Each image is made up of several pixels that are stored sequentially in memory. However, pixels consist of one or more integers of 8-bit values. This value indicates the intensity of the colour (s) in the image. Every white and black image has one colour channel and there are usually three-color channels for colour images. The processor with no SIMD image processing only eight fill a quarter of the 32-bit standard record. Any operations in this logic run on the full 32-bit, and therefore, some operations are performed on 24-bit unnecessary [33]. The result from one number does not affect other pixels, because of the fact that the image algorithms is linear. Therefore, the SIMD takes advantage of this parallel by applying simultaneously a multiple serial pixels in one record, and then processing on these pixels. In theory, SIMD instructions can produce an accelerator four to eight times when used with image processing. It has been proven to provide speeds of 1.25 to two times in video processing algorithms [34]. This is much less than theoretical acceleration four times, but it is still somewhat important for some algorithms. Accelerated algorithms can also can change power consumption. If the processor completes the task much faster, subsequently will have more time to go into the low power mode, which result with low power consumption. In addition, if the active processor and processor are synchronous, it may lead to higher power consumption during that time. Acceleration of any algorithm can greatly affect the end user through faster processing and reduced energy consumption [35]. The emerging need to process large data-sets of high-resolution image processing applications requires faster, more configurable, avoiding the long design time and more power-efficient systems therefore, FPGAs may play an important role in ensuring scalability, configurability and concurrency to meet the necessary application levels [36]. 4.3.Mobile Applications The demand for mobile video applications is rapidly increasing nowadays especially in wireless mobile platforms. Improving instruction set architectures and using SIMD is a logical way to achieve higher performance in portable multimedia applications [37]. Intel® Wireless MMX ™ technology designed to accelerate energy-efficient multimedia processing and applications in an energy-efficient manner. On modern , viewing and decoding multimedia content can consume a lot of energy. The power required decoding audio or video depends on the computational complexity of the coding or compression algorithms used for coding [38]. The diagram of the SIMD Help Framework relies on offloading multimedia applications in fog, computers, and mobile as shown in Fig. 8. The SIMDOM (SIMD offloading for mobile) framework consists of five modules:  SIMD for ARM internal functionality that enables implementation of ARM-based multimedia application on ISA x86.  Application profiler for static application analysis to specify the percentage of creating a SIMD instruction.  The power generator that measures the system's power parameters as to the offload manager.  The network profile generator that monitors the network status.  The offload manager that process prospect of code offload depend on inputs from other modules. The offload manager acts as a - communication manager for code migration. The application sources taken by the SIMSDOM (SIMD offloading for mobile) framework then classifies them for ARM and x86 ISAs on the cloud server, then SIMD performs X 86 compilation. The resulting application binaries recognized for SIMD instructions has provided the feasibility of unloading based on input from the application power files, network because of the difference in instruction length and register sizes [40, 41, 42].

Volume XII, Issue VI, 2020 Page No: 1457 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930

Figure 8. SIMD help framework relies on offloading multimedia application.

V. CONCLUSION The overview of SIMD computer architectures has been presented in this paper. It has provided the advantages and disadvantages of array and vector-processing architectures in the SIMD architecture. SIMD architecture is used for solving the problems that involving a huge number of computations in which all the processing elements work in parallel operation at the same time. The multimedia SIMD extensions is supporting multi-core and that highly used for field such as image processing, digital signal processing and mobile applications because of the SIMD architecture has design that gives computational methods several improvement in terms of the execution time performance, scalability of data size, cost saving, provides concurrently, and energy consumption. REFERENCES

[1] J. Goodacre, and A. Sloss, “Parallelism and the ARM Instruction Set”. Article in Computer, Vol. 38, pp.42-50, 2005. [2] S. Tanabe, T. Nagashima, and Y. Yamaguchi, "A study of an FPGA based flexible SIMD processor”. Journal ACM SIGARCH News, Vol. 39, No. 4, pp. 86-89, 2011. [3] R, Espasa, M. Valero, and J. E. Smith, "Vector Architectures: Past, Present and Future". International Conference on Supercomputing, pp.425-432, 1998. [4] N. T. Slingerland, and A. J. Smith, "Multimedia extensions for general purpose microprocessors: a survey". Microprocessors and Microsystems, Vol. 29, No. 5, pp.225–246, 2005. [5] A. Shahbahrami, B. Juurlink, and S. Vassiliadis, "A Comparison between Processor Architectures for Multimedia Applications". In Proc. 15th Annual Workshop on Circuits, Systems and Signal Processing (ProRISC), pp.139-152, 2004. [6] C. Chi, M. Alvarez-Mesa, B. Bross, B. Juurlink, and T. Schierl, "SIMD Acceleration for HEVC Decoding". Article in IEEE Transactions on Circuits and Systems for Video Technology, Vol. 25, No. 5, pp. 841-855, 2015. [7] S. Che, B. Beckmann, S. Reinhardt, and K. Skadron, "Pannotia: Understanding irregular GPGPU graph applications". Workload Characterization (IISWC) 2013 IEEE International Symposium on, pp. 185-195, 2013. [8] P. Indira, and M. Kamaraju, "Design and Implementation of 6-Stage 64-bit MIPS Pipelined Architecture". International Journal of Engineering and Advanced Technology, Vol.8, pp.790-796, 2019. [9] L. Petrica, R. Hobincu, C. Bira, "A Light-Weight and Flexible Programming Environment for Parallel SIMD Accelerators". Article in Romanian Journal of Information Science and Technology, Vol .16, No. 4, pp.336–350, 2013. [10] V. Porpodas, and T. Jones, "Throttling Automatic Vectorization: When Less Is More". International Conference on Parallel Architectures and Compilation Techniques (PACT), pp.432-444, 2015. [11] A. Brankovic, K. Stavrou, E. Gibert, and A. González, "Accurate Off-Line Phase Classification for HW/SW Co-Designed Processors". In Proceedings of the ACM International Conference on Computing Frontiers (CF '14). Cagliari, No. 5, pp. 1-10, 2014. [12] Y. Canqun, and C. Juan, "Optimizing SIMD Parallel Computation with Non-Consecutive Array Access in Inline SSE ". In Intelligent Computation Technology and Automation, International Conference on, Zhangjiajie, Hunan China, No. 12542705 , pp. 254-257, 2012. [13] J. Ryoo, K. Han, K.,Choi, "Leveraging parallelism in the presence of on CGRAs". Design Automation Conference (ASP-DAC) 2014 19th Asia and South Pacific, No. 14117483, pp. 285-291, 2014. [14] Y. Lo, . Lun, W. Wang, and J. Song, "Improved SIMD Architecture for High Performance Video Processors". IEEE Transaction on Circuits and Systems for Video Technology, Vol. 21, No.12, pp. 1769- 1783, 2011. [15] M. Baklouti, Ph. Marquet, J.L. Dekeyser, and M. Abid, "FPGA-based many-core System-on-Chip design". Microprocessors and Microsystems, Vol. 39, pp. 302–312, 2015. [16] S. Sivakrishna, and R. Yarrabothu, "Design and simulation of 5G massive MIMO kernel algorithm on SIMD vector processor". Conference on Signal Processing and Communication Engineering Systems (SPACES), Vijayawada, 2018, pp. 53-57, 2018. [17] V. Prasanth, V. Sailaja, P. Sunitha, and B. Vasantha, "Design and implementation of low power 5 stage pipelined 32 bits MIPS processor using 28nm technology". International Journal of Innovative Technology and Exploring Engineering, Vol. 8, No. (4S2), pp. 503-507, 2019.

Volume XII, Issue VI, 2020 Page No: 1458 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930

[18] A. Severance, and G. Lemieux, "VENICE: A Compact Vector Processor for FPGA Applications". 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines, Toronto, pp. 245-245, 2012. [19] C. H. Chou, A. Severance, A. D. Brant, Z. Liu, S. Sant, and G. Lemieux, "VEGAS: Soft vector processor with ". Conference: Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, FPGA 2011, pp. 15–24, 2011. [20] K. Zhang, S. Chen, Y. Wang, and J. Wan, "Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors". IEICE Electronics Express, Vol. 10, No. 9, pp. 1-7, 2013. [21] Y. Peter, J. Steffan, and J. Rose," VESPA: Portable, scalable, and flexible FPGA-based vector processors". Proceedings of the 2008 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, PP. 61-70, 2008. [22] A. Barredo, M. Cebrián, J. Moretó, M. Casas, and M. Valero, "POSTER: An Optimized Predication Execution for SIMD Extensions". 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), Seattle, WA, USA, pp. 479-480, 2019. [23] G. Sornet, S. Jubertie, F. Dupros, F. De Martin, and S. Limet, "Performance Analysis of SIMD Vectorization of High- Order Finite-Element Kernels". The 2018 International Conference on High Performance Computing & Simulation (HPCS), Orleans, pp. 423-430, .2018. [24] J. Wang, J. Sohl, O. Kraigher, and D. Liu, "Software Programmable Data Allocation in Multi-bank Memory of SIMD Processors". The 13th Euro micro Conference on Digital System Design: Architectures, Methods and Tools, Lille, pp. 28- 33, 2010. [25] G. Kim, S. Park, K. Lee, Y. Kim, I. Hong, K. Bong, D. Choi, S. Shin, J. Park, and H. Yoo, "A task-level pipelined many- SIMD augmented reality processor with congestion-aware network-on- scheduler". The 2014 IEEE COOL Chips XVII, Yokohama, pp. 1-3, 2014. [26] R. Frijns, H. Fatemi, B. Mesman, and H. Corporaal, "DC-SIMD Dynamic communication for SIMD processors". The 2008 IEEE International Symposium on Parallel and Distributed Processing, Miami, FL, pp. 1-10, 2008. [27] M. Azeem, A. Tariq, and A. U. Mirza, "A Review on Multiple Instruction Multiple Data (MIMD) Architecture". International Multidisciplinary Conference, 2015. [28] M. Kaur, R. Kaur, "A Comparative Analysis of SIMD and MIMD Architectures". International Journal of Advanced Research in Computer Science and Software Engineering, Vol.3, No.9, pp. 1151- 1156, 2013. [29] G. Mitra, B. Johnston, A. Rendell, E. McCreath, and J. Zhou, "Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel Platforms". The 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum, Cambridge, MA, pp. 1107-1116, 2013. [30] P. Zahradnik, and B. Šimák, "Education in real-time digital signal processing using digital signal processors". The 35th International Conference on Telecommunications and Signal Processing (TSP), Prague, pp. 625-628, 2012. [31] S. Seo, M. Woh, S. Mahlke, M. Scott, T. Mudge, V. Sundaram, and C., Chakrabarti,"Customizing Wide-SIMD Architectures for H.264". Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, Modelling and Simulation (IC-SAMOS 2009), Samos, Greece, pp. 172-179, 2009. [32] A. shahbahrami, and B.H. Jururlink, "SIMD Architectural Enhancements to Improve the Performance of the 2D Discrete Wavelet Transform". The 12th Euro micro Conference on Digital System Design ,Architecture, Methods and Tolls, DSD 2009 , pp. 497-504,2009. [33] D. Barina, and P. Zemcik, "Diagonal vectorisation of 2-D wavelet lifting". The IEEE International Conference on Image Processing (ICIP), Paris, pp. 2978-2982, 2014. [34] P. Enfedaque, F. Auli-Llinas, and J. Moure, "Strategies of SIMD Computing for Image Coding in GPU". The IEEE 22nd International Conference on High Performance Computing (HiPC), Bangalore, pp. 345-354, 2015. [35] M. Amiri, F. Siddiqui, C. Kelly, R. Woods, K. Rafferty, B. Bardak, "FPGA-Based Soft-Core Processors for Image Processing Applications". This article published with open access at Springerlink.com, pp. 139–156, 2017. [36] R. Woods, J. Mcallister, R. Turner, Y. Yi and G. Lightbody, "FPGA-based implementation of signal processing systems". A John Wiley and Sons, Ltd., publication, pp. i-xxi, 2008. [37] J. Pacheco, S. Hariri, "IOT security framework for smart cyber infrastructures". IEEE International Workshops on Foundations and Applications of Self-Systems, Augsburg, Germany, pp. 242–247, 2016. [38] M. Satyanarayanan, "A brief history of cloud offload: a personal journey from odyssey through cyber foraging to cloudlets". ACM SIGMOBILE Mobile Computer and Communication Rev. 2015; Vol. 18, No. 4, pp.19–23, 2015. [39] A. Gani, K. Ko, K. Ko, S. Mustafa, S. Madani, and M. Khan, "SIMDOM: A Framework for SIMD Instruction Translation and Offloading in Heterogeneous Mobile Architectures". Transactions on Emerging Telecommunications Technologies, Vol. 29, No. 4, 2017. [40] M. Altamimi, R. Palit, K. Naik, A. Nayak, "Energy as a service on the efficacy of multimedia cloud computing to save energy". IEEE 5th International Conference on Cloud Computing (CLOUD), Hawaii, USA; pp. 764-771, 2012. [41] E. Benkhelifa, T. Welsh, L. Tawalbeh, Y. Jararweh, and M. Al-Ayyoub," Leveraging software-defined-networking for energy optimisation in mobile-cloud-computing." Procedia Computer Science, Vol. 94, pp. 479–484, 2016. [42] A. Al-Araji, “Development of an On-Line Self-Tuning FPGA-PID-PWM Control Algorithm Design for DC-DC in Mobile Applications”. Journal of Engineering, Vol. 23, No. 8, pp. 84-106, 2017.

Volume XII, Issue VI, 2020 Page No: 1459