Implementation of Image Processing Operations Using Simultaneous

GVIP Journal, Volume 6, Issue 3, December, 2006 Implementation of Image Processing Operations Using Simultaneous Multithreading and Buffer Processing A.K.Manjunathachari1 K.SatyaPrasad2 1 G.PullaReddyEngineering College, Kurnool (India) 2 JNTU Collge Of Engineering, Kakinada(India) [email protected]. Abstract interest) in the image. Examples of intermediate-level Typical real time image processing applications require a operations are: region labeling, motion analysis. huge amount of processing power, computing ability and iii. High-level operations: Information derived from large resources to perform the image processing images is transformed into results or actions. These applications. The limitations appear on image processing operations work on data structures (e.g. a list) and lead to systems due to the volumetric data of image to be decisions in the application. So high-level operations can processed. This challenge is more dominant when be characterized as symbolic processing. An example of coming to process the image processing applications a high-level operation is object recognition. parallely. Parallel processing appears to be the only A image processing starts with a plain image, or solution to attain higher speed of operation at real time sequence of images, (coming from a sensor) and, while resource constraints. The nature of processing in typical processing, the type of operations moves from arithmetic image processing algorithms ranges from large arithmetic (Floating Point Operations Per Second, FLOPS) to operations to fewer one. Although the existing parallel symbolic (Million Logic Inferences Per Second, MLIPS) computing systems provide to some extent parallelism and the amount of data to process is reduced until in the for image processes but fails to support image processing end some decision is made (image understanding). As operations varying at a large rate. may be obvious, image processing tasks require large As part of my thesis, this paper presents an amounts of (different type of) computations. When real- Implementation of Image Processing Operations Using time requirements are to be met, normal (sequential) Simultaneous Multithreading and Processing Buffer by workstations are not fast enough. So more processing bifurcating the operation of parallelism and the results power is needed and parallel processing seems to be are simulated in a standard LAN environment. seems to be an economical way to satisfy these real time requirements. Besides even when current workstations Keywords: Processing Buffer, Simultaneous get fast enough to do the image processing task of today, multithreading, SIMD, MIMD parallel processing will offer more processing power and open new application areas to explore. 1. Introduction Many architectures have been proposed that try to The type of processing operations in a typical image exploit the available parallelism at different granularities. processing task varies greatly. Generally three levels of For example, pipelined processors [2, 9, 15] and multiple image processing are distinguished to analyze and tackle instruction issuing processors, such as the superscalar [l, the image processing application: low-level operations, 18] and VLIW [4, 7, 12] machines, exploit the fine-grain intermediate-level operations, and high-level operations. parallelism available at the instruction set level. In i. Low-level operations: Images are transformed into contrast, shared memory multiprocessors [8, 11, 13] modified images. These operations Work on whole exploit coarse-grain parallelism by distributing entire image structures and yield an image, a vector, or a single loop iterations to different processors. Each of these value. The computations have a local nature; they work parallel architectures have significant differences in on single pixels in an image. Examples of Low-level synchronization overhead, instruction scheduling operations are: smoothing, convolution, histogram constraints, memory latencies, and implementation generation. details, making it difficult to determine which ii. Intermediate-level operations: Images are transformed architecture is best able to exploit the available into other data structures. These operations work on parallelism. The performance potential of multiple images and produce more compact data structures (e.g. a instruction issuing and its interaction with pipelining has list). The computations usually do not work on a whole been investigated by several image but only on objects/segments (so called areas of researchers[l0,14,16,17].Their work has shown that at 47 GVIP Journal, Volume 6, Issue 3, December, 2006 the basic block level, pipelining and multiple instruction 2.1 Processing Buffer Concept: issuing are essentially equivalent in exploiting fine-grain The idea behind distributed Processing Buffer parallelism. Studies using the PASM prototype have processing is that it offers performance improvement by indicated that the multiprocessor organization may be reducing the processed data as well as (the option of) outperformed by the SIMD organization [5, 6] unless processing the data in parallel. The idea behind PB is to special care is taken to provide efficient synchronization combine the data reduction and data parallelism for the MIMD mode [6]. We extend this previous work strategies. It is geared towards iterative image processing by comparing the performance of a pipelined processor, a algorithms where only a subset of the image data is superscalar processor, and a shared memory processed. The global steps in using PB processing: multiprocessor when executing scientific application 1. scan image to collect data of interest programs. 2. put data to process in a PB In image processing operations the existing approach to 3. while PB not empty parallelism get constrained due to variant size of data and - process data in PB the required resources. Hence a system is required for the - put new(ly generated) data to efficient controlling of image processing operation with process in PB variable data size. The proposed approach realizes a endwhile parallel processing architecture integrating the PB data structure: simultaneous multithreading concept (SMT) and A PB is defined as a data structure with two main access processing buffer (PB) concept for the proper control and functions: a put() function to put data elements in the PB execution of variant image processing application. and a get() function to retrieve an arbitrary data element Section 2 discusses about the multithreading, SMT, from the PB. Furthermore, there is an empty() function processing Buffers. Section3 overviews the approach of for checking whether the PB is empty and a clear() image processing using SMT and PB with discussion on function for removing all data elements from the PB. an template matching algorithm using the above technique and finally concludes with the results and Note that both get() and put() are blocking when the PB discussion. is empty or full respectively. This definition of the access functions allows different 2. Theory implementation schemes of the PB data structure. For instance, on a workstation, the PB could be implemented Simultaneous multithreading is a processor design that combines hardware multithreading with superscalar as a linked list of data elements where elements are put processor technology to allow multiple threads to issue in the list and fetched from the list in a LIFO (Last- In instructions each cycle. Unlike other hardware First-Out), or FIFO (First-In First-Out) manner. Yet, the multithreaded architectures (such as the Tera MTA), in user may not assume the PB behaves in a certain way, which only a single hardware context (i.e., thread) is e.g. as a FIFO, and use that knowledge in his program. active on any given cycle, SMT permits all thread class PB contexts to simultaneously compete for and share { processor resources. Unlike conventional superscalar public: processors, which suffer from a lack of per-thread PB(unsigned int element_size); instruction-level parallelism, simultaneous ~PB(); boolean empty(); multithreading uses multiple threads to compensate for void put(void *element); low single-thread ILP. The performance consequence is void get(void *element); significantly higher instruction throughput and program void clear(); speedups on a variety of workloads that include }; commercial databases, web servers and scientific Distributed PB data structure: applications in both multiprogramming and parallel On a parallel system the PB data structure may be environments. distributed over a number of processors. This distributed Enhanced SMT features PB data structure is obtained by segmenting the To improve SMT performance for various Workload PB in so called partial PBs that are allocated on the mixes and provides robust quality of Service, we added processors; one on each processor. The distributed PB two features: Dynamic resource balancing and consists of n partial PBs, each one of them uniquely Adjustable thread priority. mapped to a specific processor. Instead of allowing each • Reducing the thread’s priority is the primary processor to access all data elements in the PB, a Mechanism in situations where a Thread uses more than a predetermined number of GCT entries. processor is restricted to only access his own part of the • Inhibiting the thread’s instruction decoding PB, his partial Neuro Bucket, when fetching data from Until the congestion clears is the primary Mechanism for the distributed PB. When get() is

Load more