International Journal For Technological Research In Engineering Volume 3, Issue 5, January-2016 ISSN (Online): 2347 - 4718

EFFICIENT DYNAMIC VOLTAGE SCALING OF RAZOR BASED MULTIPLIER USING BOOTH ENCODING

T.Veeranjaneyulu1, A. Manyanaik2 1PG Student (M.Tech), 2Assistant Professor Dept. of ECE, Universal Clg.Of Engg. And Tech., Guntur, AP, India

Abstract: In this paper, we present booth multiplier, substantial glitch generation and propagation. This spurious parallel processing (PP), razor based dynamic voltage switching activity can be mitigated by balancing internal scaling (DVS), and dedicated MP operand scheduling paths through a combination of architectural and transistor- to provide optimum performance for variety of operating level optimization techniques. In addition to equalizing conditions. In this paper, we proposed a multiplier of internal path delays, dynamic power reduction can also be Modified Booth (MB) Multiplier is designed to achieved by monitoring the effective dynamic range of input accommodate low power, reducing the computational operands so as to disable unused section of multiplier. complexity of multiplier unit tend to reduce the total power Therefore, an 8-bit multiplication computed on a 32-bit consumption and also reducing the area. Razor flip- Booth multiplier. In most VLSI system designs, the together with a dithering voltage unit then configure the supply voltage is also selected based on the worst case multiplier to achieve the lowest power consumption. The scenario. In order to achieve an optimal power/performance single- voltage unit and razor flip-flops help to ratio, a variable precision data path solution is needed to reduce the voltage margin and overhead typically associated cater for various types of applications. Dynamic Voltage to DVS to lowest level. Finally, the proposed high speed Scaling (DVS) can be used to match the circuit’s real booth multiplier can further benefits from an operand working load and further reduce the power consumption. scheduler that rearranges the input data, hence determine Given their fairly complex structure and interconnections, the optimum voltage and frequency operating conditions for multiplier can exhibits a large number of unbalanced minimum for power consumption. As well suited to design paths, resulting in substantial glitch generation and FIR filter design for DSP application. propagation this spurious switching activity can be mitigated Keywords: Modified Booth (MB), Dynamic Voltage Scaling by balancing internal paths through a combination of Computational complexity, Power consumption. architectural and transistor-level.

I. INTRODUCTION With the widespread use of portable computing and communication systems, power consumption has become an important issue in very large scale integration (VLSI) design. Furthermore, multipliers are fundamental building blocks and the bottleneck in terms of performance and power consumption in many multimedia and digital signal processing (DSP) applications. Therefore, it is crucial to develop a multiplier with high performance but low power consumption. Multiplier is typically designed for a fixed maximum word-length to suit the worst case scenario. However, the real effective word-lengths of an application vary dramatically. The use of a non-proper word-length may cause performance degradation or inefficient usage of the Fig-1: Overall multiplier system architecture hardware resources. In addition, the minimization of the FPGA consists logic blocks with logic cells. This logic cell multiplier power budget requires the estimation of the consist of 4 input look up table used to implement various optimal operating point including clock frequencies, supply functions. A look up table is an array that replaces runtime voltage, and threshold voltage. In most VLSI system designs, computation with a simpler array indexing operation. The the supply voltage is also selected based on the worst case savings in terms of processing time can be significant, since scenario. In order to achieve an optimal power/performance retrieving a value from memory is often faster than ratio, a variable precision data path solution is needed to cater undergoing an expensive computation or input/output for various types of applications. Dynamic Voltage Scaling operation. The tables may be pre- calculated and stored in (DVS) can be used to match the circuit’s real working load static program storage, calculated as part of a program's and further reduce the power consumption. Given their fairly initialization phase, or even stored in hardware in complex structure and interconnections, multiplier can application- specific platforms. An FPGA has three main exhibits a large number of unbalanced paths, resulting in elements, Look-Up Tables (LUT), flip-flops, and the routing

www.ijtre.com Copyright 2015.All rights reserved. 990

International Journal For Technological Research In Engineering Volume 3, Issue 5, January-2016 ISSN (Online): 2347 - 4718

matrix, that all work together to create a very flexible device. Look-up tables are how your logic actually gets implemented. A LUT consist of some number of inputs and one output. What makes a LUT powerful is that you can program what the output should be for every single possible input. A LUT consists of a block of RAM that is indexed by the LUT's inputs. The output of the LUT is whatever value is in the indexed location in its RAM. Each LUT's output can be optionally connected to a flip-flop. Groups of LUTs and flip-flops are called slices. Here the minimum voltage required for the multiplication of each combination is stored in LUT. Depending on the given input operands, corresponding voltage will be selected from the LUT and also the corresponding frequency is adjusted by frequency scaling unit. Thus using the selected voltage and frequency, it is applied to MP multiplier along with the input operands gives the corresponding results of multiplication.

Fig-3: Modified Booth 8 bit multiplier architecture.

Fig-2: Possible configuration modes of existing MP multiplier

Table-1: MODIFIED BOOTH ENCODING II. PROPOSED BLOCK FOR MODIFIED BOOTH The modified booth multiplier produces N/2 partial products, MULTIPLIER each of which depends on bits of the multiplier. In this paper, The proposed MB multiplier system comprises five different we are aiming to build up a booth encoding based multiplier. modules that are as follows: Modified booth encoding allows higher radix parallel  The MBMP multiplier. operation. The figure illustrates the architecture of modified  The input operands scheduler (IOS) whose function booth 8 bit multiplier. The 16 bit product output is obtained is to select the input data stream which is helpful to from two 8 operands namely multiplicand (MD) and perform an appropriate multiplication operation as multiplier (MR).The architecture comprises four parts: 1) per the application requirement. Booth encoder, 2) 2’s complement generator, 3) Partial  The frequency scaling unit is to generate the product generator, 4) . Booth encoder is responsible to required operating frequency of the multiplier. make three bit blocks of 8 bit multiplier starting from MSB  The voltage scaling unit (VSU) implemented using a to LSB. Hence, partial products are chosen by considering a razor based voltage dithering technique. Its function pair of bits along with the most significant bit (MSB) from is to dynamically generate the supply voltage so as the previous pair.If the MSB for the previous pair is true, to minimize power consumption. multiplicand bit must be added to the current partial product.  The dynamic voltage/frequency management unit If the MSB of the current pair is true, the current partial (VFMU) that receives the user requirements (e.g., product is selected to be negative and the next partial product throughput). It sends control signals to the VSU and is incremented. Since it is a 8 bit(N) multiplier, totally four FSU to generate the required power supply voltage partial products (N/2) have been generated. to be negative and clock for the MB multiplier. and the next partial product is incremented. Since it is a 8 bit

www.ijtre.com Copyright 2015.All rights reserved. 991

International Journal For Technological Research In Engineering Volume 3, Issue 5, January-2016 ISSN (Online): 2347 - 4718

(N) multiplier, totally four partial products (N/2) have been Implementation of Razor Flip-Flops generated. The final 16 bit output is obtained by adding the Although the worst case paths are very rarely exercised, partial products using adders. The partial products are traditional DVS approaches still maintain relatively large generated by taking the 2’s complement of the multiplicand safety margins to ensure reliable circuit operation, result (possibly left-shifted by one column). in excessive power dissipated. The razor technology is breakthrough work, which eliminates the safety margin by achieving variable tolerance through in-situ error detection and correction ability [25]. This approach is based on a razor flip-flops, which detects and correct the delay error by double sampling. The razor flip-flops are constructed out of a standard positive Edge triggered flip- flops (DFF) augmented with a shadow latch which samples at the negative clock edge. Thus, the input data is given additional time, equal to the duration of the positive clock phase, to settle down to its correct state before being sampled by the shadow latch. In order to ensure the shadow latch always capture the correct data, the minimum allowable supply voltage needs to be constrained during design time such that step-up time at the shadow latch.A comparator flags a timing error when it detects a discrepancy between the speculative sampled at the main flip-flops and the correct data sampled at the shadow latch.

Fig-4: Three PEs combined to form 32 x 32 bit multiplier

Dynamic voltage and Frequency Scaling Management: A. Dynamic Voltage Scaling (DVS) unit In this implementation DVS unit shows a dynamic power supply and a VCO are employed to achieve real-time dynamic voltage and scaling can be achieved when using voltage dithering, which exhibits faster response time than conventional voltage regulator. Voltage dithering uses power to connect different supply voltage to the load, depending on the time slots. Therefore, an intermediate average voltage is achieved.

B. Dynamic Frequency Scaling Unit Frequency scaling unit of proposed MP multiplier is used for frequency tuning to meet the system throughput requirements. This frequency unit is implemented using Fig-5: Proposed single-header voltage dithering unit Voltage Controlled Oscillator (VCO) as a seven-stage current starved ring oscillator. Using four control bits (5MHz/step), output frequency of VCO is tuned from 5 to 50 MHz by using 5-50MHz range frequency. One clock cycle is required to settle down the clock frequency. The frequency scaling unit is one which equipped with VCO is used to select frequency for each combination of multiplication. Depending on the control signal, it gives frequency that pre-calculated for 8 x 8bit, 16 x 16 bit and 32 x 32 bit for proper multiplication to reduce delay. Depending on the voltage VCO adjust the frequency. For each combination of multiplication, we can select the corresponding suitable frequency. Fig-6: Conceptual view of razor flip-flops

www.ijtre.com Copyright 2015.All rights reserved. 992

International Journal For Technological Research In Engineering Volume 3, Issue 5, January-2016 ISSN (Online): 2347 - 4718

Input Operand Scheduler Unit Comparison table for the MP multiplier and MB multiplier: The input operands scheduler which rearranges the input data Parameters Proposed Extension and hence reduce the supply voltage transition, thus power (MP (MB consumption will be reduced. It consists of range detector, multiplier) multiplier) buffer (RAM), and a voltage and frequency analyzer. These help to rearrange the input and detect the precision and send Delay(ns) 4.498 3.850ns to MB multiplier. Here proposed an IOS that will perform the Power(watts) 1.81w 0.178w following tasks: 1) reorder the input data stream so that same-precision operands are grouped together into a buffer a Number with an 5 out of 8 5 out of 8 and 2) takes the minimum supply and frequency from the r unused Flip Flop (62%) (62%) LUT. Fig 5 shows Input Operand Scheduler. Theoperation of e Number of bonded 36 out of 35 out of multiplier is controlled by two external signals .i.e. operating a IOBs 600(6%) 600(5%) frequency and voltage signal. These two signals are tuned to Number of 1 out of 1 out of correct values depending on the actual work load i.e. it BUFG/BUFGCTR 32(3%) 32(3%) depends on the input operands. The simulation is done by Ls using giving input operands and comparing the results with a Maximum Frequency 771.367M 771.367M PC that gives true results and also timing is verified. The Hz Hz precision data multiplication includes data word length up to Logic delay 2.392ns 1.270ns 32-bits. Routing delay 0.559ns 2.580ns Algorithm for IOS: There are three different algorithms to Total number of paths 324 / 18 32962 / 18 reduce this overall power consumption, algorithm A, B Table-2: Comparison table for the MP multiplier and MB and C each of these algorithms constitutes a different multiplier approach to the mixed-precision data held in the operands buffer. The performance of each algorithm is III. RESULTS AND DISCUSSIONS evaluated using a mixed precision data set with the Simulation results of MB multiplier: corresponding to each precision (8, 16, and 32-bit). Algorithm A: The algorithm A states the level of voltage and frequency is varying for different (8, 16, 32) bit precision.

Fig-8: 16*16 MB multiplier

Fig-7: Timing diagram for razor flip-flops Fig-9: 32*32 MB multiplier Algorithm-B: This algorithm removes all transitions of the power supply voltage by making Vmin32, Vmin16, IV. CONCLUSION and Vmin8 equal and adjusting f32, f16, and f8 such that the In this paper a Modified Booth (MB) Multiplier is designed overall throughput is kept unchanged. to accommodate low power, reducing the computational Algorithm-C: Only one modification has been considered complexity of multiplier unit tend to reduce the total power for compare algorithm-A with algorithm-C is inversely consumption and also reducing the area. Razor flip-flops changing the frequency and voltage is remaining same. together with a dithering voltage unit then configure the

www.ijtre.com Copyright 2015.All rights reserved. 993

International Journal For Technological Research In Engineering Volume 3, Issue 5, January-2016 ISSN (Online): 2347 - 4718

multiplier to achieve the lowest power consumption. It is well suited to design FIR filter design for DSP application. The proposed novel dedicated operand scheduler rearranges operations on input operands, hence to reduce the number of transitions of the supply voltage and, in turn, minimized the overall power consumption of the multiplier. The proposed MBrazor-based DVS multiplier provided a solution toward achieving full computational flexibility and low power consumption for various general purpose low-power applications. A. Manya Naik is an Assistant professor in the Department REFERENCES of Electronics and Communication Engineering at Universal [1] H. Lee, “A power-aware scalable pipelined booth College of Engineering &Technology ,Guntur. His interested multiplier,” in Proc. IEEE Int. SOC Conf., Sep. area of . 2004, pp. 123–126. Email:[email protected] [2] S.-R. Kuang and J.-P. Wang, “Design of power- efficient configurable booth multiplier,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 568–580, Mar. 2010. [3] T. Yamanaka and V. G. Moshnyaga, “Reducing multiplier energy by data-driven voltage variation,” in Proc. IEEE Int. Symp. Circuits Syst., May 2004, pp. 285–288. [4] M. Nakai, S. Akui, K. Seno, T. Meguro, T. Seki, T. Kondo, A. Hashiguchi, H. Kawahara, K. Kumano, and M. Shimura, “Dynamic voltage and frequency management for a low-power embedded ,” IEEE J. Solid-State Circuits, vol. 40, no. 1, pp. 28–35, Jan. 2005. [5] J.-Y. Kang and J.-L.Gaudiot, “A simple high-speed multiplier design ,” IEEE Trans. Comput., vol. 55, no. 10, pp. 1253–1258, Oct. 2006. [6] S. D. Haynes, A. Ferrari, and P. Y. K. Cheung, “Flexible reconfigurable multiplier blocks suitable for enhancing the architecture of FPGAs,” in Proc. IEEE Custom Integr. Circuits, May 1999, pp. 191– 194.

Authors Profile:

T.Veeranjaneyulu is pursuing his M.Tech in Department of Electronic and Communication Engineering at Universal College of Engineering & Technology, Guntur. His specialization is VLSID. Email:[email protected]

www.ijtre.com Copyright 2015.All rights reserved. 994